Project 'gitlab-com/gl-infra/infrastructure' was moved to 'gitlab-com/gl-infra/production-engineering'. Please update any links and bookmarks that may still have the old path.
Analyze merge_request_diff_files data
Taken from: gitlab-com/database#110 (comment 97527141)
Let's analyze the data in merge_request_diff_files
and compile statistics for:
- size of records over time (on a daily basis)
- size of individual MRs and distribution of this - are there any outstanding MRs we may be able to delete data from?
- Number of distinct file data (useful for https://gitlab.com/gitlab-org/gitlab-ce/issues/37632 in context of deduplication).
- What else?
I'm thinking we should stand up a database in gstg
for this analysis or at least use a production replica for this (maybe one that is not running other transactions).