Analyse large public repositories for performance testing suitability
As part of exploration efforts to update our performance test data with even high percentile large repos, we should analyse what repositories are available publicly and if any fall into this rare category compared to our current test data.
One notable repository has been identified is chromium, which does look to be sizable.
git sizer output
Processing blobs: 17632726
Processing trees: 30980353
Processing commits: 23538146
Matching commits to trees: 23538146
Processing annotated tags: 0
Processing references: 5611873
| Name | Value | Level of concern |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size | | |
| * Commits | | |
| * Count | 23.5 M | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
| * Total size | 15.3 GiB | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
| * Trees | | |
| * Count | 31.0 M | ******************** |
| * Total size | 110 GiB | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
| * Total tree entries | 2.57 G | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
| * Blobs | | |
| * Count | 17.6 M | *********** |
| * Total size | 1.63 TiB | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
| * Annotated tags | | |
| * Count | 0 | |
| * References | | |
| * Count | 5.61 M | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
| | | |
| Biggest objects | | |
| * Commits | | |
| * Maximum size [1] | 20.0 MiB | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
| * Maximum parents [2] | 3 | |
| * Trees | | |
| * Maximum entries [3] | 4.39 k | **** |
| * Blobs | | |
| * Maximum size [4] | 731 MiB | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
| | | |
| History structure | | |
| * Maximum history depth | 1.16 M | ** |
| * Maximum tag depth | 0 | |
| | | |
| Biggest checkouts | | |
| * Number of directories [5] | 36.8 k | ****************** |
| * Maximum path depth [6] | 20 | ** |
| * Maximum path length [7] | 284 B | ** |
| * Number of files [8] | 447 k | ******** |
| * Total size of files [9] | 5.25 GiB | ***** |
| * Number of symlinks [10] | 363 | |
| * Number of submodules [11] | 240 | ** |
Task is like analyse this repo, as well as any other potential candidates.
Edited by Grant Young