Skip to content

Split resource_label_events table

Summary

Due to the ongoing increased load issues on the primary database, we need to take action to reduce the size of significantly large tables, starting with tables > 100GB. As a table of 123.8 GiB, resource_label_events has been selected as a candidate.

As a follow-up from the discussions that happened in Consider partitioning strategies for resource_l... (#396809 - closed), and following a similar strategy to Consider partitioning strategies for descriptio... (#396805 - closed), we have agreed to the following implementation plan for partitioning resource_label_events;

  1. Prepare partitioned tables for each resource according to https://docs.gitlab.com/development/database/partitioning/hash/ (milestone M), number of partitions should be big enough to assure we don't hit the 100G limit soon again
  2. Create triggers which keep data between both tables in sync (milestone M, blocked by 1)
  3. Add background migration to copy records from resource_label_events to resource-specific table (milestone M, blocked by 2)
  4. Finalize background migration (milestone M+1 or maybe M+2 depending on estimated migration runtime)
  5. Update models which use description_versions to use new tables instead (milestone M+1, blocked by 3)
  6. Drop old resource_label_events table (milestone M+2 or later)

If we go with a partitioned table, it doesn't block us from doing further optimizations later, let's say next year or when there is a capacity for more experimentation (e.g. compressing or otherwise removing the reference_html field).

Notes

  • Depending on when this is tackled, the project to consolidate Issues and Epics into Work Item Types may have progressed. It may be reasonable to split the table into issues + epics, and MRs, rather than having a table for each, as issues and epics will inhabit the same table.
  • This issue should be promoted to epic when the work is started in order to track it across milestones.
Edited by Max Orefice