Draft: Add migration that copies pending builds to the new table [RUN ALL RSPEC] [RUN AS-IF-FOSS]
What does this MR do?
In 460318a2, I attempt to do the following:
- Truncate
ci_pending_builds
(for testing purposes only) - Add partial index on
ci_builds (id) WHERE state = 'pending' AND type = 'Ci::Build'
- Rewrite the query to make sure it's picking up the index
- The index will also drastically improve getting the lower bound for the data range we're interested in (even though
updated_at
isn't indexed)
This is based on testing the index and queries using hypopg.
HypoPG testing for batch</summary
gitlabhq_dblab=# explain WITH pending_builds AS MATERIALIZED (
SELECT id,
project_id
FROM ci_builds
WHERE status = 'pending' AND type = 'Ci::Build'
AND id BETWEEN 1245752736 AND 1245762866
)
INSERT INTO ci_pending_builds (build_id, project_id)
SELECT id,
project_id
FROM pending_builds
ON CONFLICT DO NOTHING;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Insert on ci_pending_builds (cost=7345.44..7345.53 rows=3 width=32)
Conflict Resolution: NOTHING
CTE pending_builds
-> Index Scan using ci_builds_pkey on ci_builds (cost=0.58..7345.44 rows=3 width=8)
Index Cond: ((id >= 1245752736) AND (id <= 1245762866))
Filter: (((status)::text = 'pending'::text) AND ((type)::text = 'Ci::Build'::text))
-> CTE Scan on pending_builds (cost=0.00..0.09 rows=3 width=32)
(7 rows)
gitlabhq_dblab=# select * from hypopg_create_index(E'CREATE INDEX foo ON ci_builds (id) WHERE (((status)::text = \'pending\'::text) AND ((type)::text = \'Ci::Build\'::text))');
indexrelid | indexname
------------+-------------------------------
1018983491 | <1018983491btree_ci_builds_id
(1 row)
gitlabhq_dblab=# explain WITH pending_builds AS MATERIALIZED (
SELECT id,
project_id
FROM ci_builds
WHERE status = 'pending' AND type = 'Ci::Build'
AND id BETWEEN 1245752736 AND 1245762866
)
INSERT INTO ci_pending_builds (build_id, project_id)
SELECT id,
project_id
FROM pending_builds
ON CONFLICT DO NOTHING;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Insert on ci_pending_builds (cost=4.64..4.73 rows=3 width=32)
Conflict Resolution: NOTHING
CTE pending_builds
-> Index Scan using "<1018983491btree_ci_builds_id" on ci_builds (cost=0.05..4.64 rows=3 width=8)
Index Cond: ((id >= 1245752736) AND (id <= 1245762866))
-> CTE Scan on pending_builds (cost=0.00..0.09 rows=3 width=32)
(6 rows)
HypoPG testing for lower bound query
SELECT "ci_builds"."id" FROM "ci_builds" WHERE "ci_builds"."status" = 'pending' AND "ci_builds"."type" = 'Ci::Build' AND (updated_at > '2021-06-01 08:45:56.696865') ORDER BY id ASC LIMIT 1
gitlabhq_dblab=# explain SELECT "ci_builds"."id" FROM "ci_builds" WHERE "ci_builds"."status" = 'pending' AND "ci_builds"."type" = 'Ci::Build' AND (updated_at > '2021-06-01 08:45:56.696865') ORDER BY id ASC LIMIT 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=26089.97..26089.97 rows=1 width=4)
-> Sort (cost=26089.97..26091.40 rows=574 width=4)
Sort Key: id
-> Bitmap Heap Scan on ci_builds (cost=25213.58..26087.10 rows=574 width=4)
Recheck Cond: (((status)::text = 'pending'::text) AND ((type)::text = 'Ci::Build'::text) AND (updated_at > '2021-06-01 08:45:56.696865'::timestamp without time zone))
-> BitmapAnd (cost=25213.58..25213.58 rows=574 width=0)
-> Bitmap Index Scan on ci_builds_gitlab_monitor_metrics (cost=0.00..7465.22 rows=491685 width=0)
Index Cond: ((status)::text = 'pending'::text)
-> Bitmap Index Scan on index_ci_builds_on_updated_at (cost=0.00..17747.83 rows=1444100 width=0)
Index Cond: (updated_at > '2021-06-01 08:45:56.696865'::timestamp without time zone)
(10 rows)
gitlabhq_dblab=# select * from hypopg_create_index(E'CREATE INDEX foo ON ci_builds (id) WHERE (((status)::text = \'pending\'::text) AND ((type)::text = \'Ci::Build\'::text))');
indexrelid | indexname
------------+-------------------------------
1018983492 | <1018983492btree_ci_builds_id
(1 row)
gitlabhq_dblab=# explain SELECT "ci_builds"."id" FROM "ci_builds" WHERE "ci_builds"."status" = 'pending' AND "ci_builds"."type" = 'Ci::Build' AND (updated_at > '2021-06-01 08:45:56.696865') ORDER BY id ASC LIMIT 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Limit (cost=0.05..937.11 rows=1 width=4)
-> Index Scan using "<1018983492btree_ci_builds_id" on ci_builds (cost=0.05..537875.34 rows=574 width=4)
Filter: (updated_at > '2021-06-01 08:45:56.696865'::timestamp without time zone)
(3 rows)
MATERIALIZED
?
Why An interesting observation in this context is that the CTE in the batch query has to be explicitly materialized, otherwise the query falls back to using the primary key index always, even with the added and more specific (and smaller) partial index.
So, without the MATERIALIZED
flag on the CTE (and the index being present) we get this:
QUERY PLAN
---------------------------------------------------------------------------------------------
Insert on ci_pending_builds (cost=0.58..7345.47 rows=3 width=32)
Conflict Resolution: NOTHING
-> Index Scan using ci_builds_pkey on ci_builds (cost=0.58..7345.47 rows=3 width=32)
Index Cond: ((id >= 1245752736) AND (id <= 1245762866))
Filter: (((status)::text = 'pending'::text) AND ((type)::text = 'Ci::Build'::text))
(5 rows)
The goal of this MR here is to execute the testing pipeline on dblab to confirm the following:
- Index creation time
- Timing for the minimum bound query
- Timings and their variance for the batch queries
A typical index creation time for .com for this table varies between 1.30h (low traffic time) and up to 6 hours (high traffic time).
Edited by Grzegorz Bizon