Create a set of test projects to measure ES storage usage
In https://gitlab.com/gitlab-org/gitlab-ee/issues/5599, we tested index storage usage with the default set of GDK repositories, plus the Linux kernel.
We'd like to get a representative-ish set of test projects available for testing improvements as part of &429 (closed).
This could even just be that set of projects (including issues, MRs, etc.). The important points are that:
- We can make a reasonable case for the decision.
- It's easy to change the test set of projects independently of the default GDK seeds, if necessary.
- It's easy for someone to get up and running with these projects and ES to measure index sizes.
Side note on forks:
In the past, we've noted some issues with the efficiency of storage for forks. On GitLab.com, forks are about 2.8% of projects by count. (About 28% of forks are themselves public projects; previously we noted about a 1:10 public:private project ratio, by both count and disk usage.)
So we should maybe have one or two forks in the test set, but it seems that they are currently not likely to be a huge contributor to the index size issues.