Skip to content

Worker for populating the new VSA tables

Adam Hegyi requested to merge 335389-vsa-data-collector-job into master

What does this MR do and why?

This MR implements a worker to periodically collect and aggregate records for Value Stream Analytics.

The goal is to have an efficient way of querying timestamp (stage) data. The downside is that we'll need to ensure consistency (eventual consistency). The design also prepares for possible analytics database decomposition in the future.

High-level overview:

vsa

What the worker does (for a particular group):

  1. Collects the stage configurations which tells us which timestamps to collect.
  2. Iterate over the issues and merge_requests within the group-hierarchy in batches.
  3. Collect the timestamp columns.
  4. Build stage event rows with the timestamp columns and the issue/merge request column data.
  5. Upsert the data into the _stage_events table.

The worker ensures that we don't insert too much data in one go. After the limit is reached a new job is scheduled with some delay until all rows have been processed.

**Note: **This is not going to be scheduled/enabled on production yet. We want to invoke it for gitlab-org manually and experiment with the DB queries.

How to set up and validate locally

  1. Make sure you have premium or ultimate license
  2. Seed VSA
SEED_CYCLE_ANALYTICS=true SEED_VSA=true FILTER=cycle_analytics rake db:seed_fu
  1. Click the generated project link
  2. Go to the group level. Analytics -> Value Stream
  3. Top right, click default, Create Value Stream
  4. Create a value stream
  5. Stop sidekiq:
gdk stop rails-background-jobs
  1. Start sidekiq manually
bundle exec sidekiq -q analytics_cycle_analytics_group_data_loader
  1. Start rails c and execute
Analytics::CycleAnalytics::GroupDataLoaderWorker.new.perform(Project.last.namespace_id)
  1. Go to admin / background jobs
  2. There should be a scheduled job (for MergeRequest). Schedule it.
  3. Verify that records were inserted:
Analytics::CycleAnalytics::MergeRequestStageEvent.count
Analytics::CycleAnalytics::IssueStageEvent.count

Database

I ran the tests with half-warm cache, where I executed Group.find(9970).self_and_descendants before running the actual queries. You can find the plans in the comments.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #335389 (closed)

Edited by Adam Hegyi

Merge request reports