Regain control of `ci_builds` table
Description
ci_builds table is currently very large. We process around 500k builds each day, and we not only do have a ton of rows in this table, it is also very wide - more than 50 columns.
Problems
We do have a bunch of problems with this table, it affects our velocity of development too.
- we can no longer migrate data in this table
- we can no longer add columns because we already do have more than 50
- we get statement timeouts in many places in the application
- it is using STI
Proposal
There is a lot of room for improvement. We can:
- reduce the amount of indices
- reduce the amount of columns
- reduce the total size of the table
But the thing that seems to be the most reasonable path forward is moving data with different affinity to a pipeline to different tables - processing data / visible data / historical data.
Currently we store processing-time-only information in that table. We should extract this to ci_builds_metadata and rotate on a different cadence than rows in ci_builds.
Iterations
Devise a metric for ci_builds table state
It would be great to measure how bad the situation with ci_builds table is and have a metric for this. It is not clear what would be the best metric, but the most simple one might be the table size and total indices size.
Check if build archival mechanisms are aligned with this effort
We currently do have a bunch of mechanisms implemented that allow us to "degenerate" a build and to "archive" those that seem to be outdated. We are not using these mechanisms and we should revisit them, figure it out what is a difference between a degenerated and archived build, figure the right cadence of degenerating builds.
Ensure ci_builds_metadata contains only processing data
This table currently contains a bunch columns, but we should check if data in all of them are safe to get removed one a build gets archived.
Archive legacy builds
Once we have all the information regarding the points above, it should be save to archive builds that are old. We can do that in a migration, this is going to be destructive action, so it is important to get a sign off from the product team that it will not be disruptive for users. Ensure that builds are being archived continuously according to policies that we define here.
Migrate ci_builds.options to ci_builds_metadata
Prepare a background migration that will move ci_builds.options to ci_build_metadata.processing_options. If we get the archival policies right, this should be a reasonable amount of data that we should be capable of migrating.
Move other processing columns to ci_builds_metadata and ci_pipelines_metadata
There are other columns that could be moved to either ci_builds_metadata or ci_pipelines_metadata. We should move them too.
Remove legacy columns from ci_builds
There are bunch of deprecated columns in that table, we should remove them too. stage column is known to take a lot of space, and it might not be needed anymore after we enable ci_atomic_processing FF.
Remove legacy indices from ci_builds
Migrating data, removing legacy columns and removing / moving indices should be done together, we should not remove indices until we add them in the second table.