Worker for populating the new VSA tables
What does this MR do and why?
This MR implements a worker to periodically collect and aggregate records for Value Stream Analytics.
The goal is to have an efficient way of querying timestamp (stage) data. The downside is that we'll need to ensure consistency (eventual consistency). The design also prepares for possible analytics database decomposition in the future.
High-level overview:
What the worker does (for a particular group):
- Collects the stage configurations which tells us which timestamps to collect.
- Iterate over the
issues
andmerge_requests
within the group-hierarchy in batches. - Collect the timestamp columns.
- Build stage event rows with the timestamp columns and the issue/merge request column data.
- Upsert the data into the
_stage_events
table.
The worker ensures that we don't insert too much data in one go. After the limit is reached a new job is scheduled with some delay until all rows have been processed.
**Note: **This is not going to be scheduled/enabled on production yet. We want to invoke it for gitlab-org
manually and experiment with the DB queries.
How to set up and validate locally
- Make sure you have premium or ultimate license
- Seed VSA
SEED_CYCLE_ANALYTICS=true SEED_VSA=true FILTER=cycle_analytics rake db:seed_fu
- Click the generated project link
- Go to the group level. Analytics -> Value Stream
- Top right, click
default
, Create Value Stream - Create a value stream
- Stop sidekiq:
gdk stop rails-background-jobs
- Start sidekiq manually
bundle exec sidekiq -q analytics_cycle_analytics_group_data_loader
- Start
rails c
and execute
Analytics::CycleAnalytics::GroupDataLoaderWorker.new.perform(Project.last.namespace_id)
- Go to admin / background jobs
- There should be a scheduled job (for MergeRequest). Schedule it.
- Verify that records were inserted:
Analytics::CycleAnalytics::MergeRequestStageEvent.count
Analytics::CycleAnalytics::IssueStageEvent.count
Database
I ran the tests with half-warm cache, where I executed Group.find(9970).self_and_descendants
before running the actual queries. You can find the plans in the comments.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #335389 (closed)