Skip to content

Implement cumulative label duration calculation in VSA

What does this MR do and why?

This MR implements a new way of calculating the duration for label based stages in Value Stream Analytics where we include consecutive label removal and addition event timestamp into the calculation. The change is behind the enable_vsa_cumulative_label_duration_calculation feature flag.

How does it work?

When user adds/removes label to an issue or merge request, value stream analytics can track the duration while the label was assigned.

  1. User adds label A.
  2. 1 week later, User removes label A.
  3. 1 month later, user adds label A.
  4. 1 day later, user removes label A.
  • With the old calculation, the duration would be 1 month, 1 week and 1 day.
  • With the new calculation, the duration would be 1 week and 1 day.

How is it implemented

VSA calculates duration between two timestamp expressions. This MR extends this calculation to allow returning an array of timestamps (include_all_timestamps_as_array) from the DB using ARRAY_AGG. When arrays are returned, the duration calculation logic takes these into account and calculates the durations of each timestamp pairs (end - start).

Example 1: when start event and end event are also label based event

start_event_timestamps = [t1, t2]
end_event_timestamps = [t3, t4]

duration: (t3 - t1) + (t4 - t2)

Example 2: when start event is not a label based event (example: issue created)

start_event_timestamps = [t1]
end_event_timestamps = [t3, t4]

duration: (t4 - t1) # "best effort", preserving the existing functionality.

Database

An example query would be something like this: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/25829/commands/81395

It's hard to capture a generic query because it's always depending on the stage configuration. Summary:

  • Always query 500 (batch size) issues.
  • For each issue look up the label assigned event timestamps for the given label.
  • Aggregate the timestamps into an array.

Compared to the previous version of the query the change is that we'll load all timestamp and not just the first one we find in the table. Since the same label is not assigned-unassigned very often the query complexity won't increase significantly.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

How to set up and validate locally

Requires a GDK using ultimate license.

  1. In rails console enable the experiment fully
    Feature.enable(:enable_vsa_cumulative_label_duration_calculation)
  2. Visit a group (copy its id and ensure there is at least one group label) and go to: Analyze: Value stream analytics
  3. Create a new value stream where the stage events are using the issue label added and issue label removed events (pick a label).
  4. Create a new in one of the project.
  5. Add the configured label to the issue.
  6. Wait 1 minute.
  7. Remove the label.
  8. Wait 5 minutes.
  9. Add the same label again.
  10. Wait 2 minutes.
  11. Remove the label.

Invoke the aggregation and verify the stored duration in VSA:

Analytics::CycleAnalytics::DataLoaderService.new(group: Group.find(24), model: Issue).execute

puts Analytics::CycleAnalytics::IssueStageEvent.where(issue_id: Issue.last.id).take.duration_in_milliseconds

The value should be around `180_000` milliseconds and definitely not around `480_000`.

Value stream configuration:

image

Related to #432568 (closed)

Edited by Adam Hegyi

Merge request reports