[VSA] Hash VSA stages
What
When reworking the VSA backend, we'll have a separate table(s) that contains the start and end event timestamps. To efficiently query this table by the stage (start_event_identifier, end_event_identifier), the table will contain the stage_hash_id
column.
The stages are stored in two tables:
- Group level stages:
analytics_cycle_analytics_group_stages
- Project level stages:
analytics_cycle_analytics_project_stages
(not in use yet)
Problem
Within a group, you can have several stages with the same start and end event configuration:
- In different value streams
- In different subgroups
- In projects
If we collect the data for these stages, we'll store the same issues
and merge_requests
data several times.
Solution
Hash each stage using the following attributes:
[start_event_identifier, end_event_identifier]
For label based events:
[start_event_identifier, start_event_label_id, end_event_identifier, end_event_label_id]
- Add the
stage_hash
method toAnalytics::CycleAnalytics::GroupStage
andAnalytics::CycleAnalytics::ProjectStage
which returns a SHA1 hash based on the rules above. - Create a new table called
analytics_cycle_analytics_stage_hashes(id: bigint, hash: binary)
- In a
before_save
hook in theAnalytics::CycleAnalytics::GroupStage
andAnalytics::CycleAnalytics::ProjectStage
models, insert a newanalytics_cycle_analytics_stage_hashes
value (if it does not exist). - Now we have a stable
id
value (stage_hash_id
) which we can use in the new VSA table(s).
Example:
project = Project.last
group_hash = Analytics::CycleAnalytics::GroupStage.new(group: project.group, end_event_identifier: 'merge_request_created', start_event_identifier: 'merge_request_merged').stage_hash
project_hash = Analytics::CycleAnalytics::GroupStage.new(project: project, end_event_identifier: 'merge_request_created', start_event_identifier: 'merge_request_merged').stage_hash
# should be true:
puts group_hash == project_hash
Edited by Adam Hegyi