Defining large MR for performance testing

In theory, the upper-limit of a “large” MR can be infinite. But we need to define what a “large MR” means so that we can have a SSOT for continuously testing performance and tracking improvements.

Characteristics of the test sample

Must be representative of the 95 or 99 percentile of MRs on GitLab.com, so we are taking into account the largest MRs in a reasonable way.
Must be automatically reproducible, so that we can “destroy” and create the test sample without significant effort.
Must be testable locally and on GitLab.com, so that we can test and debug it under different circumstances.

Definition of the test sample

File size: Mix of files with small and large file sizes.

File types Mix of image, binary, code, and prose files.

No. of files: Many changed files.

No. of lines: Mix of files with few and many changed lines.

No. of commits: Many commits.

No. of comments: Many comments and threads, mix of threads with few and many reply comments, and resolved, unresolved, and outdated threads.

Comments and description content: Mix of text, headings, tables, images, videos, tasks, diagrams, math, emojis, code blocks.

Pipelines Many pipelines.

Data on Large MR (source)

This is based on the 200 most active projects (as measured by number of MRs in last 90 days) on GitLab.com

BY FILE COUNT
- 95%ile: 28.0
- 99%ile: 128.0
- Largest: 5984
BY LINE COUNT
- 95%ile: 1375.0
- 99%ile: 11925.599999999977
- Largest: 14280733
BY COMMIT COUNT
- 95%ile: 16.0
- 99%ile: 89.0
- Largest: 8802

Decision

Rspec Upgrade will be used for benchmark and testing moving forward.

Defining large MR for performance testing

Characteristics of the test sample

Definition of the test sample

Data on Large MR (source)

Decision

Related links