Partition the events table by year

We have a prune events worker which prunes user activity every 2 years.

This was extended from 12 months in https://gitlab.com/gitlab-org/gitlab-ce/issues/52246, where we implemented this short-term fix to give us some additional time to consider a scaling strategy for the related table (the events table). This fix was merged in 11.4, which means a more permanent solution must take place by October 2019.

This data is very useful, and we should not ever prune data unless explicitly done by an instance administrator.

Further details

See context from @yorickpeterse in https://gitlab.com/gitlab-org/gitlab-ce/issues/24244#note_60995986 on the DB challenges.

Proposal

  • Partition the events table by range on the created_at column.
  • Tables should be created for each year with a schema like events-yyyy specifying the year of the relevant records.
  • Once events is no longer being pruned, we should remove prune_old_events_worker.rb.

Links / references

Edited by Jeremy Watson (ex-GitLab)