Skip to content

[VSA] Hash VSA stages

What

When reworking the VSA backend, we'll have a separate table(s) that contains the start and end event timestamps. To efficiently query this table by the stage (start_event_identifier, end_event_identifier), the table will contain the stage_hash_id column.

The stages are stored in two tables:

  • Group level stages: analytics_cycle_analytics_group_stages
  • Project level stages: analytics_cycle_analytics_project_stages (not in use yet)

Problem

Within a group, you can have several stages with the same start and end event configuration:

  • In different value streams
  • In different subgroups
  • In projects

If we collect the data for these stages, we'll store the same issues and merge_requests data several times.

Solution

Hash each stage using the following attributes:

[start_event_identifier, end_event_identifier]

For label based events:

[start_event_identifier, start_event_label_id, end_event_identifier, end_event_label_id]
  • Add the stage_hash method to Analytics::CycleAnalytics::GroupStage and Analytics::CycleAnalytics::ProjectStage which returns a SHA1 hash based on the rules above.
  • Create a new table called analytics_cycle_analytics_stage_hashes(id: bigint, hash: binary)
  • In a before_save hook in the Analytics::CycleAnalytics::GroupStage and Analytics::CycleAnalytics::ProjectStage models, insert a new analytics_cycle_analytics_stage_hashes value (if it does not exist).
  • Now we have a stable id value (stage_hash_id) which we can use in the new VSA table(s).

Example:

project = Project.last

group_hash = Analytics::CycleAnalytics::GroupStage.new(group: project.group, end_event_identifier: 'merge_request_created', start_event_identifier: 'merge_request_merged').stage_hash

project_hash = Analytics::CycleAnalytics::GroupStage.new(project: project, end_event_identifier: 'merge_request_created', start_event_identifier: 'merge_request_merged').stage_hash


# should be true:

puts group_hash == project_hash
Edited by Adam Hegyi