Add optimizer for BBO
Ref: #578057
What does this MR do and why?
Following up !209791 (merged)
This MR makes sure the BBO framework leverage the existing database signals. It allows to stop or optimize the operation based on several heuristics.
Query plan
-
Create 10 more partitions using - query
DO $$ BEGIN FOR i IN 2..11 LOOP EXECUTE format('CREATE TABLE IF NOT EXISTS gitlab_partitions_dynamic.background_operation_workers_%s PARTITION OF public.background_operation_workers FOR VALUES IN (%L)', i, i::text); END LOOP; END $$; -
Data population: Created 100000 records per partitions, with equal distribution between 5 tables and atleast 20% paused workers (10% with on_hold_until in the past and other 10% with on_hold_until in the future) - query.
DO $$ DECLARE i int; table_names text[] := ARRAY['users', 'projects', 'issues', 'notes', 'namespaces']; records_per_partition int := 100000; records_per_table int := 20000; BEGIN FOR i IN 1..11 LOOP INSERT INTO public.background_operation_workers (partition, organization_id, user_id, created_at, batch_size, sub_batch_size, interval, job_class_name, batch_class_name, table_name, column_name, gitlab_schema, status, on_hold_until, min_cursor, max_cursor) SELECT i, (i * 1000000) + rc, (random() * 1000000)::bigint, CURRENT_TIMESTAMP - (random() * interval '30 days'), 1000, 100, 120, 'BackgroundWorker', 'BackgroundJob', table_names[((rc - 1) / records_per_table) + 1], 'id', 'gitlab_main', CASE WHEN (rc - 1) % records_per_table < (records_per_table * 0.1)::int THEN 2 WHEN (rc - 1) % records_per_table < (records_per_table * 0.2)::int THEN 2 ELSE ((rc - 1) % records_per_table - (records_per_table * 0.2)::int) % 5 END, CASE WHEN (rc - 1) % records_per_table < (records_per_table * 0.1)::int THEN CURRENT_TIMESTAMP - interval '2 days' WHEN (rc - 1) % records_per_table < (records_per_table * 0.2)::int THEN CURRENT_TIMESTAMP + interval '2 days' END, jsonb_build_array(1), jsonb_build_array(1000000) FROM generate_series(1, records_per_partition) AS rc; RAISE NOTICE 'Inserted % records into partition %', records_per_partition, i; END LOOP; END $$; -
Generate some jobs
INSERT INTO background_operation_jobs (
worker_id,
organization_id,
partition,
worker_partition,
created_at,
started_at,
finished_at,
batch_size,
sub_batch_size,
pause_ms,
status,
attempts,
metrics,
min_cursor,
max_cursor
)
SELECT
w.id,
w.organization_id,
1,
w.partition,
NOW() - (j || ' minutes')::interval,
NOW() - (j || ' minutes')::interval,
NOW() - ((j - 0.5) || ' minutes')::interval,
1000,
100,
100,
3,
CASE WHEN random() < 0.1 THEN 2 ELSE 1 END,
jsonb_build_object(
'total_processed', (random() * 1000)::int,
'duration_ms', (random() * 3600000)::int
),
jsonb_build_array((j - 1) * 1000 + 1),
jsonb_build_array(j * 1000)
FROM (
SELECT id, organization_id, partition
FROM background_operation_workers
LIMIT 100
) w
CROSS JOIN generate_series(1, 100) AS j;
- Query plan - not exactly the same as we can't identify the righ uuid to pick up
SELECT
background_operation_jobs.*
FROM
background_operation_jobs
WHERE
background_operation_jobs.worker_id = '66846817-bb93-4510-a7e5-0b4f90af7f98'
AND background_operation_jobs.finished_at IS NOT NULL
AND (background_operation_jobs.status IN (3))
ORDER BY
background_operation_jobs.finished_at DESC
LIMIT 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=259.54..259.54 rows=1 width=141) (actual time=2.005..2.009 rows=10 loops=1)
Buffers: shared hit=336
-> Sort (cost=259.54..259.54 rows=1 width=141) (actual time=2.004..2.005 rows=10 loops=1)
Sort Key: background_operation_jobs.finished_at DESC
Sort Method: top-N heapsort Memory: 27kB
Buffers: shared hit=336
-> Index Scan using background_operation_jobs_1_status_idx on background_operation_jobs_1 background_operation_jobs (cost=0.29..259.53 rows=1 width=141) (actual time=0.217..1.983 rows=100 loops=1)
Index Cond: (status = 3)
Filter: ((finished_at IS NOT NULL) AND (worker_id = 'c6feed1e-7115-42f2-8e54-f7c191b47972'::uuid))
Rows Removed by Filter: 11002
Buffers: shared hit=336
Planning Time: 0.204 ms
Execution Time: 2.051 ms
Edited by Max Orefice