Skip to content
Snippets Groups Projects

WIP: GitLab CI/CD pipelines storage improvement

Closed Grzegorz Bizon requested to merge blueprint/gb/pipelines-storage-improvements into master

Description

This merge request adds an architectural blueprint describing the work that needs to be done to regain control over ci_builds table, reduce the size of this table, and help with PostgreSQL scaling problems.

Review app: https://blueprint-gb-pipelines-storage-improvements.about.gitlab-review.app/handbook/engineering/architecture/blueprints/pipelines/storage/index.html

Edited by Grzegorz Bizon

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
78 migration that is almost impossible right now.
79
80 > TODO elaborate
81
82 ## [What] Proposal
83
84 We store a lot of data in `ci_builds` table, but the data stored there have
85 different affinity to a pipeline. In particular - we store pipeline
86 visualization data there and pipeline processing data.
87
88 Pipeline visualization data and processing data can have different retention
89 policies too. Separating these types of data can help us vastly reduce the
90 amount of stored data and split data amongst multiple tables.
91
92 > TODO calculate average ratio of visualization to processing data, like 40/60%
93 > and support this with real numbers / graphs.
  • Author Maintainer

    @Finotto would you be able to check the total size of ci_builds.options and what percentage of total size of ci_builds it represents?

  • @grzesiek we have the following tables with 'build' in the database, and I do not see any clear options table there :

    SELECT
       relname as "Table",
       pg_size_pretty(pg_total_relation_size(relid)) As "Size"
       FROM pg_catalog.pg_statio_user_tables where relname like '%build%' ORDER BY pg_total_relation_size(relid) DESC;
                    Table                |  Size
    -------------------------------------+---------
     ci_builds                           | 1539 GB
     ci_build_trace_sections             | 459 GB
     ci_builds_metadata                  | 71 GB
     ci_build_needs                      | 9362 MB
     ci_build_trace_section_names        | 1795 MB
     ci_daily_build_group_report_results | 199 MB
     ci_build_trace_chunks               | 35 MB
     packages_build_infos                | 20 MB
     ci_builds_runner_session            | 824 kB
     ci_build_report_results             | 24 kB

    ci_builds represents at the moment 1539 GB from the total database of 7369 GB, which would be approx 21% of the database.

  • Author Maintainer

    @Finotto I mean options column in ci_builds table - what percentage of total data in ci_builds do we store in options column.

  • @grzesiek I added the info:

    gitlabhq_production=# select
    gitlabhq_production-#     pg_size_pretty(sum(pg_column_size(options))) as total_size,
    gitlabhq_production-#     pg_size_pretty(avg(pg_column_size(options))) as average_size,
    gitlabhq_production-#     sum(pg_column_size(options)) * 100.0 / pg_total_relation_size('ci_builds') as percentage
    gitlabhq_production-# from ci_builds;
     total_size |        average_size        |     percentage
    ------------+----------------------------+---------------------
     354 GB     | 667.0868096973743355 bytes | 22.8841407398832713

    So this column represents 22% of the whole table. I would suggest to normalize and create a separated table.

    cc: @glopezfernandez @albertoramos @NikolayS

    Edited by Jose Finotto
  • Author Maintainer

    I expect that we might be able to remove 90% of this data and reduce our database size by around 300 gigabytes. It still depends on how many pipelines in the database are older than 3 and 6 months, but I think that presumably more than 90%.

    @Finotto can you also check the size of ci_builds.stage? I expect that this column will be completely redundant after we wrap up our effort with atomic processing of pipelines. I think this to be a few gigabytes too, and if everything goes right, this data will not be needed at all too.

    Edited by Grzegorz Bizon
  • Author Maintainer

    @Finotto can you please take a look at ci_builds.stage size? This is the column that we might be able to remove completely, even without backup (because we already do have backup in ci_stages.name), so this might be the low hanging fruit we are looking for here too.

  • @grzesiek sorry for the late answer, sure I give you the info in few minutes.

  • @grzesiek

    the info is:

    # select
     pg_size_pretty(sum(pg_column_size(stage))) as total_size,
     pg_size_pretty(avg(pg_column_size(stage))) as average_size,
     sum(pg_column_size(stage)) * 100.0 / pg_total_relation_size('ci_builds') as percentage
    from ci_builds;;
     total_size |       average_size       |       percentage
    ------------+--------------------------+------------------------
     4590 MB    | 7.5525122452349447 bytes | 0.25371424219677802001
    (1 row)
  • @grzesiek please do not forget that this column is part of an index:

    "index_ci_builds_on_project_id_for_successfull_pages_deploy" btree (project_id) WHERE type::text = 'GenericCommitStatus'::text AND stage::text = 'deploy'::text AND name::text = 'pages:deploy'::text AND status::text = 'success'::text

  • Author Maintainer

    Thanks!

  • Please register or sign in to reply
  • 126 this iteration for the feedback from users because in the user interface the
    127 builds are going to be unretriable and unprocessable and in case of making a
    128 wrong decision about how old the builds should be to archive them, we would
    129 still be able to change our minds about it.
    130
    131 ### Remove archived data
    132
    133 In order to migrate data between `ci_builds` and `ci_builds_metadata` we need
    134 to remove old data, because there is currently too much to migrate the data
    135 without major problems.
    136
    137 This will be a destructive action, but perhaps we can devise a way to store
    138 data that we are going to remove from PostgreSQL in a different type of
    139 storage, to make this a two-way decision, however difficult recreating the
    140 database state will be - being able to revert wrong decision might be important
    141 here.
    • As you said, archiving builds affects them being processable. However archived builds would still be visible. Between soft-archiving build and removing data we would also need to ensure the missing data would not cause UX issues.

      Maybe we could first introduce a feature flag that would completely hide builds if enabled (e.g. returning 404), while the data is still persisted. This could be reverted easily.

    • changed this line in version 5 of the diff

    • Please register or sign in to reply
  • 134 to remove old data, because there is currently too much to migrate the data
    135 without major problems.
    136
    137 This will be a destructive action, but perhaps we can devise a way to store
    138 data that we are going to remove from PostgreSQL in a different type of
    139 storage, to make this a two-way decision, however difficult recreating the
    140 database state will be - being able to revert wrong decision might be important
    141 here.
    142
    143 This will require a sign-off from executives and product people.
    144
    145 ### Migrate `options` column from `ci_builds` to `ci_builds_metadata`
    146
    147 Prepare a background migration that will move processing data, especially
    148 `ci_builds.options` to `ci_build_metadata.processing_options`.
    149
  • Grzegorz Bizon added 1 commit

    added 1 commit

    • 64aef993 - Add CI/CD pipelines storage improvements blueprint

    Compare with previous version

  • Grzegorz Bizon added 1 commit

    added 1 commit

    • 36c2e19a - Add CI/CD pipelines storage improvements blueprint

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • Uh-oh! My apologies: I meant to suggest some changes to the introduction and ended up committing them :-(

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading