Add compound index to project_daily_statistics in preparation for table partitioning

What does project_daily_statistics table do

It is responsible for holding project fetch statistics. It gets updated on each fetch from the repository. The REST API offers users access to statistics for the last 30 days.

What is the problem

Table size reached 53GB and continues growing without cleanup. To resolve the rapid growth rate, we should use partitioning and remove old data

The link to the graph

Related to: Partitioning: Create async compound index for p... (#561027 - closed)

What does this MR do

As discussed below, we chose partitioning over BBM because partitioning provides better query performance through partition pruning, more efficient maintenance operations on smaller table chunks, and instant old data removal by dropping entire partitions rather than scanning millions of rows for deletion.

This MR implements Step 1 of a 5-step partitioning plan:

  1. This MR: Add compound index on (date, id)asynchronously for efficient backfill
  2. Add compound index on (date, id) synchronously
  3. Create partitioned table with 30-day retention
  4. Install sync trigger
  5. Run batched background migration to backfill last 30 days
  6. Finalize and swap tables

How to test locally

  1. Run bin/rails db:migrate:main
  2. Verify the compound index was created
  3. Confirm the index index_project_daily_statistics_on_date_and_id appears in the indexes list

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Emma Park

Merge request reports

Loading