GKG Pagination: Agent-Driven Model (Research) ($5972795) · Snippets · GitLab.org / orbit / GitLab Orbit

From discussions w/ @michaelangeloio in slack

Yes, keyset could work on the root node's table scan, but it doesn't paginate what you'd want.

Here's why. In a traversal like "find all MergeRequests authored by Users", the SQL looks like:
WITH _nf_u AS (SELECT id FROM gl_user AS u WHERE ...)
SELECT e0.traversal_path, e0.source_id, e0.target_id, ...
FROM gl_edge AS e0
WHERE e0.source_id IN (SELECT id FROM _nf_u)
  AND e0.relationship_kind = 'AUTHORED'
LIMIT 25
The root node table (gl_user) is only in the CTE. The LIMIT is on the edge result, not on users. If you apply keyset to the root node's CTE (WHERE (u.traversal_path, u.id) > cursor), you're saying "only consider users after this cursor position." But that doesn't control how many edge rows you get back — one user might have 500 authored MRs, another might have 1.

So keyset on the first table narrows the input (which root entities to scan), but the output (edge rows) isn't paginated in a predictable way. You might get 0 rows on one page and 500 on the next, depending on the fan-out of the root entity.

The existing keyset code in optimize.rs does exactly this — it injects the predicate on the root node. It works as a performance optimization (skip root entities you've already processed), but it's not correct pagination of the result set. The LIMIT on the edge result is what actually caps the page size, and the cursor doesn't track where you are in edge-space.

The only way keyset on the root would give correct pagination is if you could guarantee 1:1 root-to-result-row mapping, which is only true for Search.