Add compound index to project_daily_statistics in preparation for table partitioning (!198858) · Merge requests · GitLab.org / GitLab

What does `project_daily_statistics` table do

It is responsible for holding project fetch statistics. It gets updated on each fetch from the repository. The REST API offers users access to statistics for the last 30 days.

What is the problem

Table size reached 53GB and continues growing without cleanup. To resolve the rapid growth rate, we should use partitioning and remove old data

The link to the graph

What does this MR do

As discussed below, we chose partitioning over BBM because partitioning provides better query performance through partition pruning, more efficient maintenance operations on smaller table chunks, and instant old data removal by dropping entire partitions rather than scanning millions of rows for deletion.

This MR implements Step 1 of a 5-step partitioning plan:

✅ This MR: Add compound index on (date, id)asynchronously for efficient backfill
Add compound index on (date, id) synchronously
Create partitioned table with 30-day retention
Install sync trigger
Run batched background migration to backfill last 30 days
Finalize and swap tables

How to test locally

Run bin/rails db:migrate:main
Verify the compound index was created
Confirm the index index_project_daily_statistics_on_date_and_id appears in the indexes list

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited Aug 12, 2025 by Emma Park

Add compound index to project_daily_statistics in preparation for table partitioning

What does project_daily_statistics table do

What is the problem

What does this MR do

How to test locally

MR acceptance checklist

Merge request reports

What does `project_daily_statistics` table do