Add new table to for unique job names (Part 1)
#446304
Actual Piece to get this working will be done in:Problem
We need to have the ability to filter jobs by name for !106458. We are unable to due this due to the current table size and needing to add a index. This MR needed to be closed !107102 (closed) due to that problem.
Solution
CREATE TABLE p_ci_build_names (
build_id bigint NOT NULL,
partition_id bigint NOT NULL,
project_id integer NOT NULL,
name text,
PRIMARY KEY (build_id, partition_id),
FOREIGN KEY (partition_id, build_id) REFERENCES p_ci_builds (partition_id, id) ON UPDATE CASCADE ON DELETE CASCADE
)
PARTITION BY LIST (partition_id);
CREATE TABLE ci_build_names partition of p_ci_build_names for values in (100, 101);
After a pipeline is created, we schedule a sidekiq task to go through the jobs and update their build_name_id
attribute with a record from p_ci_build_names
. With this we can extend the index on p_ci_builds
with only one bigint column which will increase its size by about 16 bytes per row. The queries could look something like this:
select * from p_ci_builds where build_name_id in (select id from p_ci_build_names where to_tsvector(name) @@ to_tsquery('test') and project_id = <>) and ...
We also need a specialized search index on p_ci_build_names
.
I'm running select count(distinct(name)) from ci_builds where project_id = 278964;
on dblab to see how many unique names we have in gitlab-org/gitlab
.
Results:
gitlabhq_dblab=# select count(*) from ci_builds where project_id = 278964;
count
-----------
133611505
(1 row)
Time: 1731848.576 ms (28:51.849)
gitlabhq_dblab=# select count(distinct(name)) from ci_builds where project_id = 278964;
count
-------
25391
(1 row)
Time: 109566988.659 ms (1 d 06:26:06.989)