Add database events for new appointed models
Background
Data team is planning to set up and utilise their own snowplow event collection pipeline to track every interaction with gitlab.com database (see: https://gitlab.com/gitlab-data/gitlab.com-saas-data-pipeline/-/issues). As the next part of evaluation of proposed solution Data team request two new models from GitLab application to be tracked with https://gitlab.com/gitlab-org/gitlab/-/blob/6aa3b620a8214f733f3d0acd9bd86384b00d9f84/app/models/concerns/database_event_tracking.rb#L33
Goal
Add https://gitlab.com/gitlab-org/gitlab/-/blob/6aa3b620a8214f733f3d0acd9bd86384b00d9f84/app/models/concerns/database_event_tracking.rb#L33 to two models appointed by data team (cc @vedprakash2021)
Implementation tips
See !92079 (diffs) for reference
This issue is blocked by #390805 (closed)
Designs
- Show closed items
Relates to
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Mikołaj Wawrzyniak marked this issue as related to #390805 (closed)
marked this issue as related to #390805 (closed)
- 🤖 GitLab Bot 🤖 added devopsanalyze sectionanalytics labels
added devopsanalyze sectionanalytics labels
- Author Maintainer
Hey @vedprakash2021 please add information which tables and which columns Data team wish to track, thank you
Collapse replies - Developer
Hi @mikolaj_wawrzyniak , I will be sharing the 2 tables details ASAP. I missed this request will provide today.
1 - Developer
Hi @mikolaj_wawrzyniak , Please find below list of 2 tables
- VULNERABILITIES
- on average daily delta we get is 1.5 to 1.6 million on weekdays and on weekend less than 1 million.
- Below is the column we would like to track the events
SELECT id , confidence , confidence_overridden , confirmed_at , created_at , dismissed_at , resolved_at , severity_overridden , state , updated_at FROM vulnerabilities
Also can confirm that none of these columns are RED data.
Events to capture :- insert update and delete.
- MERGE_REQUEST_METRICS
- on Average daily delta we get between 250k to 500K over weekdays and over weekend close to 100k.
- Below is the column we would like to track the events
SELECT id , merge_request_id , latest_build_started_at , latest_build_finished_at , first_deployed_to_production_at , merged_at , created_at , updated_at , pipeline_id , merged_by_id , latest_closed_by_id , latest_closed_at , first_comment_at , first_commit_at , last_commit_at , diff_size , modified_paths_size , commits_count , first_approved_at , first_reassigned_at , added_lines , removed_lines FROM merge_request_metrics
Also can confirm that none of these columns are RED data.
Events to capture :- insert update and delete. cc @dvanrooijen2
Edited by ved prakash - Author Maintainer
Thank you @vedprakash2021 I run quick check to also include volume of updates into the provided numbers for the first table
vulnerabilities
throughput seems to be around 5 millions on business day and 1 million on weekendfor the second one
merge_request_metrics
disproportion seems to be even bigger, throughput for business day is between 6 and 7 millions and 2 millions on the weekendSo total throughput is expected close to 12 - 13 millions on business day, so the pipeline needs to be prepared. Additionally even though I do not expect issues up front on application layer with that volume I would definitely appreciate to use percentage based feature flag and gradually increase load to make sure that there are no unexpected issues.
Edited by Mikołaj Wawrzyniak
- Mikołaj Wawrzyniak mentioned in issue gitlab-data/gitlab.com-saas-data-pipeline#17 (closed)
mentioned in issue gitlab-data/gitlab.com-saas-data-pipeline#17 (closed)
- ved prakash mentioned in issue gitlab-data/gitlab.com-saas-data-pipeline#18 (closed)
mentioned in issue gitlab-data/gitlab.com-saas-data-pipeline#18 (closed)
- Developer
@gitlab-org/analytics-section/product-intelligence/engineers Please add your estimation to this thread using the guide
Collapse replies - Maintainer
Refinement / Weighing
Ready for Development: Yes
Weight: 3
Reasoning:
2 new event emitting methods, specs and a FF.
Iteration MR/Issues Count: 2
One for code changes and two for FF (add and remove)
Documentation required: Nn
2
- Michał Wielich set weight to 3
set weight to 3
- Sebastian Rehm changed milestone to %15.11
changed milestone to %15.11
- Sebastian Rehm added workflowready for development label
added workflowready for development label
- Michał Wielich assigned to @michold
assigned to @michold
- Michał Wielich mentioned in merge request !116125 (merged)
mentioned in merge request !116125 (merged)
- Michał Wielich mentioned in issue #403041 (closed)
mentioned in issue #403041 (closed)
- Michał Wielich marked this issue as related to #403041 (closed)
marked this issue as related to #403041 (closed)
- Michał Wielich added workflowin review label and removed workflowready for development label
added workflowin review label and removed workflowready for development label
- 🤖 GitLab Bot 🤖 changed milestone to %16.0
changed milestone to %16.0
- 🤖 GitLab Bot 🤖 added missed:15.11 label
added missed:15.11 label
- Michał Wielich closed
closed
- Maintainer
Hi @tjayaramaraju @bastirehm! The code for this feature is now on master, but to actually test it, we also need to roll out the feature flag: #403041 (closed) - we will probably need to plan it for some milestone.
Collapse replies - Developer
Thanks I followed up with a comment in the feature flag issue. See #403041 (comment 1370715051)
- Developer
@vedprakash2021 since I haven't heard anything about this for quite some time:
What is the status of replicating our database via Snowplow? Is this still progressing or has this been abandoned?
Collapse replies - Developer
Hi @bastirehm ,
Unfortunately this has been put to stop as we were missing quite lots of events. More information over here https://gitlab.com/groups/gitlab-data/-/epics/880
We are looking towards the cells project to turn on logical replication.
I will also be playing an issue for clean up of the hanging object as part of this current setup.
- Developer
Ah good to know. So that means we could already start removing this functionality again @vedprakash2021 since I take it that we completely excluded this as a valid way forward now?
- Developer
Yes @bastirehm . We can start with decommissioning of this.
Yes we have completely excluded this for replicating the database event.
Shall I create an issue for you to stop sending the events this snowplow.
- Developer
I created one in #425684 (closed), which will cover removal on our end, but not cover removing the infrastructure that was set up to do this.
- Sebastian Rehm mentioned in issue #425684 (closed)
mentioned in issue #425684 (closed)
- Sebastian Rehm mentioned in issue #408644 (closed)
mentioned in issue #408644 (closed)
- Piotr Skorupa mentioned in commit ce4cd721
mentioned in commit ce4cd721
- Piotr Skorupa mentioned in commit b3004943
mentioned in commit b3004943
- Piotr Skorupa mentioned in commit c5d4dfa9
mentioned in commit c5d4dfa9