Project Import - Measure results for 10x Initiative summary
Project Import was identified as a key component of the 10x Initiative. We have made some great improvements to the import project iteratively and this issue was created to perform a final set of tests to summarize the import improvements. Ideally, we will measure the improvements of a few different scenarios (in no particular order).
- GitLabHQ data set - export tar file 617.2Mb, project.json => 77Mb, project.bundle => 617.2Mb
- XL data set - export tar file 4.19Gb, project.json => 2.48Gb, project.bundle => 3.94Gb
@gl-memory
Measurement
Measurement was done on local environment, using GDK
.
Versions used for measurement:
- without any improvements except Graceful Failures (f0d907b2)
- all merged improvements introduced in 10x Initiative (1c0f47ae)
We measured:
- Execution time
- Count of executed SQL queries during the restore (Number of SQL calls)
- Amount of GC cycles happening (GC Count, Minor GC Count and Major GC Count)
Note:
XL Data set import, originally was calculating out to a 24 hour import, but kept breaking. The import was wrapped in a single transaction, which contributed to the length of the import. Moving the import to a rake task reduced the time to about 12 hours
Import with XL Data set without any improvements, kept breaking, only to be successful after 4 retries, with 19 relations failed to import. Several
GRPC::DeadlineExceeded
exceptions happened and broke the import.
Import | Improvements | Commit SHA | Execution time | Number of SQL calls | GC Count | Minor GC Count | Major GC Count |
---|---|---|---|---|---|---|---|
GitlabHQ | f0d907b2 | 00:17:28 | 240,400 | 1,277 | 1,277 | 24 | |
GitlabHQ | 1c0f47ae | 00:11:15 | 92,944 | 980 | 958 | 18 | |
XL Data set | f0d907b2 | 09:49:59 | 5,794,395 | 33,491 | 33,132 | 359 | |
Xl Data set | 1c0f47ae | 03:47:23 | 2,047,864 | 25,318 | 25,105 | 213 |
Conclusion
XL Data Set:
- 61.45% faster execution time,
- 64.65% less SQL calls,
- 24.4% less GC, 40.66% less major GC cycles.
GitlabHQ Data Set
- 35.59% faster execution time,
- 61.33% less SQL calls,
- 23.25% less GC, 25% less major GC cycles.