Test Usage Ping on Self-Hosted Instances (Step 3)
Parent issue: https://gitlab.com/gitlab-org/telemetry/issues/308
Once usage ping timeouts work on GitLab.com, we can begin testing on self-hosted instances we control. To test this, we will pull the gitlab repo and deploy it onto a personal instance which will have it’s database scaled to match GitLab.com scale. On these instances, we will manually flip the feature flag and trigger a usage ping which should hit the version's application and end up in Snowflake.
Result
We may safely deploy batch counters, usage activity by stage, usage activity by stage monthly and other counters to self-hosted instances.
- Sample queries in 1.a show that the usage ping calculation on self-hosted instances takes > 10x times less time
- Batch counter doesn't cause any instabilities
- Gitlab.com is 15x larger than the next self-hosted instance => Distinct queries for user_id fields depend on this
- Gitlab.com has a database statement timeout configuration of
15seconds
yet the default omnibus installations have60 seconds
- Some large self-hosted instances may have less powerful configurations, but smaller instances should be so.
1. Counter performance
1.a Sample queries on a syntetic self-hosted environment
On largest known self-hosted instances we expect all of the usage ping to be computed in 1 hour or less
Query or Table | 1. Gitlab.com | 2. Largest Synthetic Self-hosted | 3. Synthetic XXL self-hosted Gitlab.com |
---|---|---|---|
as of 2020-03-05 | No traffic, no locks, no vacuum | ||
Users | 5.3M | 500K | 5.4M |
Projects | 12M | 1.5M | 16.2M |
Namespaces | 6M | 500K | 5.4 |
Gitlab::UsageData.count(User.active) | 20.28 | 0.18 | 2.76 |
Gitlab::UsageData.count(User) | 3.24 | 0.19 | 1.74 |
Gitlab::UsageData.count(Namespace) | 6.38 | 0.14 | 1.52 |
Gitlab::UsageData.count(::Project.where(service_desk_enabled: true)) | 43.07 | 0.64 | 7.0 |
Gitlab::UsageData.distinct_count(Namespace, :owner_id) | 12.0 | 1.02 | 11.6 |
Gitlab::UsageData.distinct_count(Project, :creator_id) | 19.7 | 1.56 | 12.9 |
Total estimated time ( optimization fixes will improve) | 12 hours | <1 hours | 6 hours |
1.b How big is gitlab.com compared to the largest self-hosted instance by one metric (for successful pings)
Active Users | Issues | Merqe requests | Todos | Notes | Projects | Deployments | ------ |
---|---|---|---|---|---|---|---|
Gitlab.com: 5 millions | 22 millions | 37 millions | 38 millions | 228 millions | 12 millions | 43 millions | |
Gitlab.com 15x larger | 28x | 8x | 12.5x | 17.8x | 20x | 14x | ------ |
Max self-hosted: 353K | 767K | 456K | 3M | 12.8M | 579K | 2.9M | ------ |
2. Analysis
2.a Top Self-hosted instances vs Gitlab.com
First row shows the number of rows on gitlab.com and following row contains how big gitlab.com is relative to the largest one
Gitlab.com: 5 millions | 22 millions | 37 millions | 38 millions | 228 millions | 12 millions | 43 millions | |
---|---|---|---|---|---|---|---|
Gitlab.com 15x larger | 28x | 8x | 12.5x | 17.8x | 20x | 14x | ------ |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
2.b Large tables on gitlab.com
Table | Number of rows in millions |
---|---|
notes | 228 |
ci_pipelines | 99 |
deployments | 43 |
todos | 38 |
merge_requests | 37 |
issues | 22 |
lfs_objects | 21 |
projects | 12 |
labels | 10 |
protected_branches | 10 |
namespaces | 6 |
users | 5 |
Edited by Alper Akgun