Skip to content

Test Usage Ping on Self-Hosted Instances (Step 3)

Parent issue: https://gitlab.com/gitlab-org/telemetry/issues/308

Once usage ping timeouts work on GitLab.com, we can begin testing on self-hosted instances we control. To test this, we will pull the gitlab repo and deploy it onto a personal instance which will have it’s database scaled to match GitLab.com scale. On these instances, we will manually flip the feature flag and trigger a usage ping which should hit the version's application and end up in Snowflake.

Result

We may safely deploy batch counters, usage activity by stage, usage activity by stage monthly and other counters to self-hosted instances.

  1. Sample queries in 1.a show that the usage ping calculation on self-hosted instances takes > 10x times less time
  2. Batch counter doesn't cause any instabilities
  3. Gitlab.com is 15x larger than the next self-hosted instance => Distinct queries for user_id fields depend on this
  4. Gitlab.com has a database statement timeout configuration of 15seconds yet the default omnibus installations have 60 seconds
  5. Some large self-hosted instances may have less powerful configurations, but smaller instances should be so.

1. Counter performance

1.a Sample queries on a syntetic self-hosted environment

On largest known self-hosted instances we expect all of the usage ping to be computed in 1 hour or less

Query or Table 1. Gitlab.com 2. Largest Synthetic Self-hosted 3. Synthetic XXL self-hosted Gitlab.com
as of 2020-03-05 No traffic, no locks, no vacuum
Users 5.3M 500K 5.4M
Projects 12M 1.5M 16.2M
Namespaces 6M 500K 5.4
Gitlab::UsageData.count(User.active) 20.28 0.18 2.76
Gitlab::UsageData.count(User) 3.24 0.19 1.74
Gitlab::UsageData.count(Namespace) 6.38 0.14 1.52
Gitlab::UsageData.count(::Project.where(service_desk_enabled: true)) 43.07 0.64 7.0
Gitlab::UsageData.distinct_count(Namespace, :owner_id) 12.0 1.02 11.6
Gitlab::UsageData.distinct_count(Project, :creator_id) 19.7 1.56 12.9
Total estimated time ( optimization fixes will improve) 12 hours <1 hours 6 hours

1.b How big is gitlab.com compared to the largest self-hosted instance by one metric (for successful pings)

Active Users Issues Merqe requests Todos Notes Projects Deployments ------
Gitlab.com: 5 millions 22 millions 37 millions 38 millions 228 millions 12 millions 43 millions
Gitlab.com 15x larger 28x 8x 12.5x 17.8x 20x 14x ------
Max self-hosted: 353K 767K 456K 3M 12.8M 579K 2.9M ------

2. Analysis

2.a Top Self-hosted instances vs Gitlab.com

First row shows the number of rows on gitlab.com and following row contains how big gitlab.com is relative to the largest one
Gitlab.com: 5 millions 22 millions 37 millions 38 millions 228 millions 12 millions 43 millions
Gitlab.com 15x larger 28x 8x 12.5x 17.8x 20x 14x ------
screenshot-2020-03-04-12-32-43 screenshot-2020-03-04-12-33-14 screenshot-2020-03-04-12-33-43 screenshot-2020-03-04-12-34-11 screenshot-2020-03-04-12-35-16 screenshot-2020-03-04-12-36-44 screenshot-2020-03-04-12-40-35

2.b Large tables on gitlab.com

Table Number of rows in millions
notes 228
ci_pipelines 99
deployments 43
todos 38
merge_requests 37
issues 22
lfs_objects 21
projects 12
labels 10
protected_branches 10
namespaces 6
users 5
Edited by Alper Akgun