Skip to content

Implement generic keyset pagination

Adam Hegyi requested to merge ahegyi-keyset-experiment into master

What does this MR do?

This MR implements utility classes for building generic keyset paginated ActiveRecord queries.

Where we could use this?

  • GraphQL API, replacing the existing keyset pagination implementation.
  • REST API
  • Batch processing, background jobs. Iterating over large volume of data.

Why keyset pagination?

https://use-the-index-luke.com/no-offset

Current implementation(s)

This MR contains

  • Keyset pagination utility classes
  • Testing the query builder with different ordering options using an "in-memory" table
  • Simple iterator class to loop over records
  • Update the GraphQL merged_at ordering for MergeRequest to use a different tie breaker column
  • Update the GraphQL similarity ordering for projects

Compatibility with our GraphQL code

In the Gitlab::Graphql::Pagination::Keyset::KeysetExperiment module overrides functionality in the Keyset::Connection class to make the experimental implementation work if the ActiveRecord scope is ordered by the new Order class.

Idea for integration:

The OrderInfo class could inspect the current ActiveRecord scope and generate the Order configuration (check primary key, order column, nullable) using the new implementation. If generating the configuration is not possible, raise an error and inform the user that the order needs to be configured manually, like for the MergeRequest#order_merged_at scope.

Performance

Similar to the existing GraphQL pagination implementation.

Looking at the queries (original GraphQL keyset implementation and this MR), I noticed that we use OR conditions when paginating and ordering by two columns. The last column is usually a tie-breaker which is used for non-distinct fields, for example: ORDER BY created_at, id.

Unfortunately this makes keyset pagination as slow as a standard OFFSET + LIMIT query and won't scale well. Alternatively we could try transforming the OR queries into UNION to leverage index scans.

Usage example

Keyset pagination based on two columns

merge_request_metrics_merged_at, merge_request_metrics.id columns.

https://gitlab.com/gitlab-org/gitlab/-/blob/a67b2e0ae4c3c9b896b66743126f16c413c5c19b/app/models/merge_request.rb#L301

Keyset pagination with Ci::Pipeline

Snippet: ci_pipelines_keyset.rb

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Related to !50579 (merged) and #281152 (closed), among others.

Edited by Dan Jensen

Merge request reports