Skip to content

Support SQL literal keyset values

What does this MR do?

This MR changes the condition building logic in the keyset pagination library in a way that eliminates the conditions (in ruby) on the passed in cursor values. The changes will mostly affect queries where NULL values are anticipated.

Why?

The groupoptimize plans to implement efficient group level queries based on this PoC. The optimization heavily relies on recursive queries and SQL literal keyset values. This requires the underlying library to generate SQL queries that will work with "unknown" values: actual value or NULL. This makes things a bit more complicated because we cannot just do column=x because column=NULL returns NULL.

A cursor value can be the following SQL expression: ARRAY[1,2,3][2] (2nd element of an array which is defined somewhere in the query.

Example

For example, when the nulls are located at the beginning of the resultset (ASC), we add the following condition to capture the potential rows with actual values: (1,3,4...):

if column_definition.nulls_first? && value.blank?
  conditions << column_definition.column_expression.not_eq(nil) # relative_position IS NOT NULL

Relative positions according to the sorting (relative_position ASC):

NULL
NULL
NULL
NULL
1
3
4
5
6

The value.blank? check needs to be eliminated and replaced with an SQL equivalent. Let's assume that the value is 3. This is how the condition would look like:

relative_position IS NOT NULL AND 3 IS NULL

This evaluates the expression as false and no rows will be returned. Query when the relative position value is nil:

relative_position IS NOT NULL AND NULL IS NULL

This returns the rows: 1, 3, 4, 5, 6

Query changes

Keyset OR query, old:

SELECT "issues".*
FROM "issues"
WHERE (("issues"."id" > 147
        AND "issues"."relative_position" = 30)
       OR ("issues"."relative_position" > 30))
ORDER BY "issues"."relative_position" ASC,
         "issues"."id" ASC
LIMIT 10

Keyset OR query, new:

SELECT "issues".*
FROM "issues"
WHERE (("issues"."id" > 147
        AND "issues"."relative_position" = 30
        AND 30 IS NOT NULL)
       OR ("issues"."id" > 147
           AND "issues"."relative_position" IS NULL
           AND 30 IS NULL)
       OR ("issues"."relative_position" > 30)
       OR ("issues"."relative_position" IS NOT NULL
           AND 30 IS NULL))
ORDER BY "issues"."relative_position" ASC,
         "issues"."id" ASC
LIMIT 10

The same query but with UNION, old:

SELECT "issues".*
FROM (
        (SELECT "issues".*
         FROM "issues"
         WHERE ("issues"."id" > 147
                AND "issues"."relative_position" = 30)
         ORDER BY "issues"."relative_position" ASC, "issues"."id" ASC
         LIMIT 10)
      UNION ALL
        (SELECT "issues".*
         FROM "issues"
         WHERE ("issues"."relative_position" > 30)
         ORDER BY "issues"."relative_position" ASC, "issues"."id" ASC
         LIMIT 10)) issues
ORDER BY "issues"."relative_position" ASC,
         "issues"."id" ASC
LIMIT 10

The same query but with UNION, new:

SELECT "issues".*
FROM (
        (SELECT "issues".*
         FROM "issues"
         WHERE ("issues"."id" > 147
                AND "issues"."relative_position" = 30
                AND 30 IS NOT NULL)
         ORDER BY "issues"."relative_position" ASC, "issues"."id" ASC
         LIMIT 10)
      UNION ALL
        (SELECT "issues".*
         FROM "issues"
         WHERE ("issues"."id" > 147
                AND "issues"."relative_position" IS NULL
                AND 30 IS NULL)
         ORDER BY "issues"."relative_position" ASC, "issues"."id" ASC
         LIMIT 10)
      UNION ALL
        (SELECT "issues".*
         FROM "issues"
         WHERE ("issues"."relative_position" > 30)
         ORDER BY "issues"."relative_position" ASC, "issues"."id" ASC
         LIMIT 10)
      UNION ALL
        (SELECT "issues".*
         FROM "issues"
         WHERE ("issues"."relative_position" IS NOT NULL
                AND 30 IS NULL)
         ORDER BY "issues"."relative_position" ASC, "issues"."id" ASC
         LIMIT 10)) issues
ORDER BY "issues"."relative_position" ASC,
         "issues"."id" ASC
LIMIT 10

Performance

The query change will not affect the performance significantly (only parsing time). We do have more conditions however these conditions will be turned to a no-op, for example: 30 IS NULL will make the whole subquery a no-op.

Screenshots or Screencasts (strongly suggested)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Related to #335388 (closed)

Edited by Adam Hegyi

Merge request reports