Investigate performance & scalability for devops::growth
The purpose of this issue is to investigate performance concerns for growth team's responsibily areas.
Areas of investigation
1. experiment_users postgres table and related queries
Concerns with inserts.
gitlabhq_production=> select date_trunc('month', created_at) as period, count(*) from experiment_users group by 1 order by 1 desc;
period | count
------------------------+--------
2021-03-01 00:00:00+00 | 359462
2021-02-01 00:00:00+00 | 680383
2021-01-01 00:00:00+00 | 395077
2020-12-01 00:00:00+00 | 85631
2020-11-01 00:00:00+00 | 250651
2020-10-01 00:00:00+00 | 449921
2020-09-01 00:00:00+00 | 241132
2020-08-01 00:00:00+00 | 133816
2. onboarding_progresses table and related queries
gitlabhq_production=> select date_trunc('month', created_at) as period, count(*) from onboarding_progresses group by 1 order by 1 desc;
period | count
------------------------+-------
2021-03-01 00:00:00+00 | 68095
2021-02-01 00:00:00+00 | 60443
2021-01-01 00:00:00+00 | 36367
3. Redis HLL tracking and readings.
Concerns with Redis overload
4. gitlab-experiment gem experiment caching
5. Snowplow frontend tracking
Browser performance and timings
IGLU overload and mirroring
6. Snowplow backend tracking performance
Threading, connections, memory use
7. Spam and abuse risks related to growth experiments & features
As growth team removes frictions in usage, spam & abusers may discover opportunities.
discussed with @pcalder
/cc @gitlab-org/growth for any other areas of investigation
8. experiment_subjects table
gitlabhq_production=> select date_trunc('month', created_at) as period, count(*) from experiment_subjects group by 1 order by 1 desc;
period | count
------------------------+-------
2021-03-01 00:00:00+00 | 66542
2021-02-01 00:00:00+00 | 43812
(2 rows)
Edited by Alper Akgun