Segmented export parts can timeout due to plan flip
Summary
Segmented exports have a flaw with how segments are created. In production, we observed an export timing out with this SQL:
SELECT
sbom_occurrences.*
FROM
sbom_occurrences
WHERE
traversal_ids >= '{4249178}' AND
traversal_ids < '{4249179}' AND
(
sbom_occurrences.traversal_ids,
sbom_occurrences.id
) >= (
'{4249178,5030580,87255895}',
5405902769
) AND
(
sbom_occurrences.traversal_ids,
sbom_occurrences.id
) <= (
'{4249178,5087837,9692938,9692949}',
3801323122
) AND
sbom_occurrences.archived = false
ORDER BY
sbom_occurrences.traversal_ids ASC,
sbom_occurrences.id ASC
LIMIT 1000;
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/30048/commands/93270
This happens because postgres is using the wrong index.
Limit (cost=7.56..387.38 rows=1000 width=353) (actual time=100.991..101.158 rows=1000 loops=1)
Buffers: shared hit=37359
I/O Timings: read=0.000 write=0.000
-> Incremental Sort (cost=7.56..116036.76 rows=305483 width=353) (actual time=100.988..101.092 rows=1000 loops=1)
Sort Key: sbom_occurrences.traversal_ids, sbom_occurrences.id
Buffers: shared hit=37359
I/O Timings: read=0.000 write=0.000
-> Index Scan using index_unarchived_occurrences_for_aggregations_package_manager on public.sbom_occurrences (cost=0.69..100469.57 rows=305483 width=353) (actual time=0.036..87.293 rows=19232 loops=1)
Index Cond: ((sbom_occurrences.traversal_ids >= '{4249178}'::bigint[]) AND (sbom_occurrences.traversal_ids < '{4249179}'::bigint[]) AND (sbom_occurrences.traversal_ids >= '{4249178,5030580,87255895}'::bigint[]) AND (sbom_occurrences.traversal_ids <= '{4249178,5087837,9692938,9692949}'::bigint[]))
Filter: ((ROW(sbom_occurrences.traversal_ids, sbom_occurrences.id) >= ROW('{4249178,5030580,87255895}'::bigint[], '5405902769'::bigint)) AND (ROW(sbom_occurrences.traversal_ids, sbom_occurrences.id) <= ROW('{4249178,5087837,9692938,9692949}'::bigint[], '3801323122'::bigint)))
Rows Removed by Filter: 17701
Buffers: shared hit=37348
I/O Timings: read=0.000 write=0.000
In this example, over 17k rows are filtered out in-memory. Postgres should be able to scan index_sbom_occurrences_on_traversal_ids_and_id without having to filter rows.
CREATE INDEX index_sbom_occurrences_on_traversal_ids_and_id ON sbom_occurrences USING btree (traversal_ids, id) WHERE (archived = false);
Steps to reproduce
Example Project
What is the current bug behavior?
What is the expected correct behavior?
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)(we will only investigate if the tests are passing)
Possible fixes
Edited by Brian Williams