Segmented export parts can timeout due to plan flip

Summary

Segmented exports have a flaw with how segments are created. In production, we observed an export timing out with this SQL:

SELECT
    sbom_occurrences.*
FROM
    sbom_occurrences
WHERE
    traversal_ids >= '{4249178}' AND
    traversal_ids < '{4249179}' AND
    (
        sbom_occurrences.traversal_ids,
        sbom_occurrences.id
    ) >= (
        '{4249178,5030580,87255895}',
        5405902769
    ) AND
    (
        sbom_occurrences.traversal_ids,
        sbom_occurrences.id
    ) <= (
        '{4249178,5087837,9692938,9692949}',
        3801323122
    ) AND
    sbom_occurrences.archived = false
ORDER BY
    sbom_occurrences.traversal_ids ASC,
    sbom_occurrences.id ASC
LIMIT 1000;

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/30048/commands/93270

This happens because postgres is using the wrong index.

 Limit  (cost=7.56..387.38 rows=1000 width=353) (actual time=100.991..101.158 rows=1000 loops=1)
   Buffers: shared hit=37359
   I/O Timings: read=0.000 write=0.000
   ->  Incremental Sort  (cost=7.56..116036.76 rows=305483 width=353) (actual time=100.988..101.092 rows=1000 loops=1)
         Sort Key: sbom_occurrences.traversal_ids, sbom_occurrences.id
         Buffers: shared hit=37359
         I/O Timings: read=0.000 write=0.000
         ->  Index Scan using index_unarchived_occurrences_for_aggregations_package_manager on public.sbom_occurrences  (cost=0.69..100469.57 rows=305483 width=353) (actual time=0.036..87.293 rows=19232 loops=1)
               Index Cond: ((sbom_occurrences.traversal_ids >= '{4249178}'::bigint[]) AND (sbom_occurrences.traversal_ids < '{4249179}'::bigint[]) AND (sbom_occurrences.traversal_ids >= '{4249178,5030580,87255895}'::bigint[]) AND (sbom_occurrences.traversal_ids <= '{4249178,5087837,9692938,9692949}'::bigint[]))
               Filter: ((ROW(sbom_occurrences.traversal_ids, sbom_occurrences.id) >= ROW('{4249178,5030580,87255895}'::bigint[], '5405902769'::bigint)) AND (ROW(sbom_occurrences.traversal_ids, sbom_occurrences.id) <= ROW('{4249178,5087837,9692938,9692949}'::bigint[], '3801323122'::bigint)))
               Rows Removed by Filter: 17701
               Buffers: shared hit=37348
               I/O Timings: read=0.000 write=0.000

In this example, over 17k rows are filtered out in-memory. Postgres should be able to scan index_sbom_occurrences_on_traversal_ids_and_id without having to filter rows.

CREATE INDEX index_sbom_occurrences_on_traversal_ids_and_id ON sbom_occurrences USING btree (traversal_ids, id) WHERE (archived = false);

Steps to reproduce

Example Project

What is the current bug behavior?

What is the expected correct behavior?

Relevant logs and/or screenshots

Output of checks

Results of GitLab environment info

Expand for output related to GitLab environment info


(For installations with omnibus-gitlab package run and paste the output of:
`sudo gitlab-rake gitlab:env:info`)

(For installations from source run and paste the output of:
`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)

Results of GitLab application Check

Expand for output related to the GitLab application check

(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true)
(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)
(we will only investigate if the tests are passing)

Possible fixes

Edited Jul 19, 2024 by Brian Williams