Add compound index to project_daily_statistics in preparation for table partitioning
What does project_daily_statistics table do
It is responsible for holding project fetch statistics. It gets updated on each fetch from the repository. The REST API offers users access to statistics for the last 30 days.
What is the problem
Table size reached 53GB and continues growing without cleanup. To resolve the rapid growth rate, we should use partitioning and remove old data
Related to: Partitioning: Create async compound index for p... (#561027 - closed)
What does this MR do
As discussed below, we chose partitioning over BBM because partitioning provides better query performance through partition pruning, more efficient maintenance operations on smaller table chunks, and instant old data removal by dropping entire partitions rather than scanning millions of rows for deletion.
This MR implements Step 1 of a 5-step partitioning plan:
-
✅ This MR: Add compound index on(date, id)asynchronously for efficient backfill - Add compound index on
(date, id)synchronously - Create partitioned table with 30-day retention
- Install sync trigger
- Run batched background migration to backfill last 30 days
- Finalize and swap tables
How to test locally
- Run
bin/rails db:migrate:main - Verify the compound index was created
- Confirm the index
index_project_daily_statistics_on_date_and_idappears in the indexes list
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.