Regain control of `ci_builds` table

Description

ci_builds table is currently very large. We process around 500k builds each day, and we not only do have a ton of rows in this table, it is also very wide - more than 50 columns.

Problems

We do have a bunch of problems with this table, it affects our velocity of development too.

we can no longer migrate data in this table
we can no longer add columns because we already do have more than 50
we get statement timeouts in many places in the application
it is using STI

Proposal

There is a lot of room for improvement. We can:

reduce the amount of indices
reduce the amount of columns
reduce the total size of the table

But the thing that seems to be the most reasonable path forward is moving data with different affinity to a pipeline to different tables - processing data / visible data / historical data.

Currently we store processing-time-only information in that table. We should extract this to ci_builds_metadata and rotate on a different cadence than rows in ci_builds.

Iterations

Devise a metric for ci_builds table state

It would be great to measure how bad the situation with ci_builds table is and have a metric for this. It is not clear what would be the best metric, but the most simple one might be the table size and total indices size.

Check if build archival mechanisms are aligned with this effort

We currently do have a bunch of mechanisms implemented that allow us to "degenerate" a build and to "archive" those that seem to be outdated. We are not using these mechanisms and we should revisit them, figure it out what is a difference between a degenerated and archived build, figure the right cadence of degenerating builds.

Ensure `ci_builds_metadata` contains only processing data

This table currently contains a bunch columns, but we should check if data in all of them are safe to get removed one a build gets archived.

Archive legacy builds

Once we have all the information regarding the points above, it should be save to archive builds that are old. We can do that in a migration, this is going to be destructive action, so it is important to get a sign off from the product team that it will not be disruptive for users. Ensure that builds are being archived continuously according to policies that we define here.

Migrate `ci_builds.options` to `ci_builds_metadata`

Prepare a background migration that will move ci_builds.options to ci_build_metadata.processing_options. If we get the archival policies right, this should be a reasonable amount of data that we should be capable of migrating.

Move other processing columns to `ci_builds_metadata` and `ci_pipelines_metadata`

There are other columns that could be moved to either ci_builds_metadata or ci_pipelines_metadata. We should move them too.

Remove legacy columns from `ci_builds`

There are bunch of deprecated columns in that table, we should remove them too. stage column is known to take a lot of space, and it might not be needed anymore after we enable ci_atomic_processing FF.

Remove legacy indices from `ci_builds`

Migrating data, removing legacy columns and removing / moving indices should be done together, we should not remove indices until we add them in the second table.

Edited Jun 18, 2020 by Grzegorz Bizon