WIP: GitLab CI/CD pipelines storage improvement
Description
This merge request adds an architectural blueprint describing the work that needs to be done to regain control over ci_builds
table, reduce the size of this table, and help with PostgreSQL scaling problems.
Merge request reports
Activity
added Architecture decision label
added 1 commit
- e6e19c08 - Add CI/CD pipelines storage improvements blueprint
I'm working on
ci_builds
improvements architectural blueprint here @glopezfernandez @darbyfrey.mentioned in issue gitlab-org/gitlab#213103 (closed)
added 1 commit
- 4b8f2bec - Add CI/CD pipelines storage improvements blueprint
added 1 commit
- a284efee - Add CI/CD pipelines storage improvements blueprint
78 migration that is almost impossible right now. 79 80 > TODO elaborate 81 82 ## [What] Proposal 83 84 We store a lot of data in `ci_builds` table, but the data stored there have 85 different affinity to a pipeline. In particular - we store pipeline 86 visualization data there and pipeline processing data. 87 88 Pipeline visualization data and processing data can have different retention 89 policies too. Separating these types of data can help us vastly reduce the 90 amount of stored data and split data amongst multiple tables. 91 92 > TODO calculate average ratio of visualization to processing data, like 40/60% 93 > and support this with real numbers / graphs. @Finotto would you be able to check the total size of
ci_builds.options
and what percentage of total size ofci_builds
it represents?@grzesiek we have the following tables with 'build' in the database, and I do not see any clear
options
table there :SELECT relname as "Table", pg_size_pretty(pg_total_relation_size(relid)) As "Size" FROM pg_catalog.pg_statio_user_tables where relname like '%build%' ORDER BY pg_total_relation_size(relid) DESC; Table | Size -------------------------------------+--------- ci_builds | 1539 GB ci_build_trace_sections | 459 GB ci_builds_metadata | 71 GB ci_build_needs | 9362 MB ci_build_trace_section_names | 1795 MB ci_daily_build_group_report_results | 199 MB ci_build_trace_chunks | 35 MB packages_build_infos | 20 MB ci_builds_runner_session | 824 kB ci_build_report_results | 24 kB
ci_builds represents at the moment 1539 GB from the total database of 7369 GB, which would be approx 21% of the database.
@Finotto I mean
options
column inci_builds
table - what percentage of total data inci_builds
do we store inoptions
column.@grzesiek I added the info:
gitlabhq_production=# select gitlabhq_production-# pg_size_pretty(sum(pg_column_size(options))) as total_size, gitlabhq_production-# pg_size_pretty(avg(pg_column_size(options))) as average_size, gitlabhq_production-# sum(pg_column_size(options)) * 100.0 / pg_total_relation_size('ci_builds') as percentage gitlabhq_production-# from ci_builds; total_size | average_size | percentage ------------+----------------------------+--------------------- 354 GB | 667.0868096973743355 bytes | 22.8841407398832713
So this column represents 22% of the whole table. I would suggest to normalize and create a separated table.
Edited by Jose FinottoI expect that we might be able to remove 90% of this data and reduce our database size by around 300 gigabytes. It still depends on how many pipelines in the database are older than 3 and 6 months, but I think that presumably more than 90%.
@Finotto can you also check the size of
ci_builds.stage
? I expect that this column will be completely redundant after we wrap up our effort with atomic processing of pipelines. I think this to be a few gigabytes too, and if everything goes right, this data will not be needed at all too.Edited by Grzegorz Bizon@Finotto can you please take a look at
ci_builds.stage
size? This is the column that we might be able to remove completely, even without backup (because we already do have backup inci_stages.name
), so this might be the low hanging fruit we are looking for here too.@grzesiek sorry for the late answer, sure I give you the info in few minutes.
the info is:
# select pg_size_pretty(sum(pg_column_size(stage))) as total_size, pg_size_pretty(avg(pg_column_size(stage))) as average_size, sum(pg_column_size(stage)) * 100.0 / pg_total_relation_size('ci_builds') as percentage from ci_builds;; total_size | average_size | percentage ------------+--------------------------+------------------------ 4590 MB | 7.5525122452349447 bytes | 0.25371424219677802001 (1 row)
@grzesiek please do not forget that this column is part of an index:
"index_ci_builds_on_project_id_for_successfull_pages_deploy" btree (project_id) WHERE type::text = 'GenericCommitStatus'::text AND stage::text = 'deploy'::text AND name::text = 'pages:deploy'::text AND status::text = 'success'::text
- Resolved by Grzegorz Bizon
126 this iteration for the feedback from users because in the user interface the 127 builds are going to be unretriable and unprocessable and in case of making a 128 wrong decision about how old the builds should be to archive them, we would 129 still be able to change our minds about it. 130 131 ### Remove archived data 132 133 In order to migrate data between `ci_builds` and `ci_builds_metadata` we need 134 to remove old data, because there is currently too much to migrate the data 135 without major problems. 136 137 This will be a destructive action, but perhaps we can devise a way to store 138 data that we are going to remove from PostgreSQL in a different type of 139 storage, to make this a two-way decision, however difficult recreating the 140 database state will be - being able to revert wrong decision might be important 141 here. As you said, archiving builds affects them being processable. However archived builds would still be visible. Between soft-archiving build and removing data we would also need to ensure the missing data would not cause UX issues.
Maybe we could first introduce a feature flag that would completely hide builds if enabled (e.g. returning 404), while the data is still persisted. This could be reverted easily.
changed this line in version 5 of the diff
- Resolved by Grzegorz Bizon
134 to remove old data, because there is currently too much to migrate the data 135 without major problems. 136 137 This will be a destructive action, but perhaps we can devise a way to store 138 data that we are going to remove from PostgreSQL in a different type of 139 storage, to make this a two-way decision, however difficult recreating the 140 database state will be - being able to revert wrong decision might be important 141 here. 142 143 This will require a sign-off from executives and product people. 144 145 ### Migrate `options` column from `ci_builds` to `ci_builds_metadata` 146 147 Prepare a background migration that will move processing data, especially 148 `ci_builds.options` to `ci_build_metadata.processing_options`. 149 changed this line in version 12 of the diff
added 1 commit
- 64aef993 - Add CI/CD pipelines storage improvements blueprint
added 1 commit
- 36c2e19a - Add CI/CD pipelines storage improvements blueprint