Skip to content

Implement new query backend for Cycle Analytics

Adam Hegyi requested to merge new-cycle-analytics-query-backend into master

What does this MR do?

What is Cycle Analytics

  • Find all the Issue or MergeRequest records matching with a date range query (start_event and end_event) = Stage.
  • Calculate the duration (end_event_time - start_event_time)
  • Extract the median duration
  • Extract the list of records relevant to the date range
  • In EE cycle analytics stages will be customizable. In CE we only provide the default 7 stages.

High Level Overview

  • DataCollector is the high level interface for the feature.
  • BaseQueryBuilder is responsible for providing the base query, joining the absolutely necessary tables and do high level filtering.
  • An Event (start, end) could alter the query (join additional tables when needed, apply_query_customization). It defines a timestamp expression that will be used for the duration calculation.
  • Median and Records are using the base query provided by the DataCollector and do additional query manipulation.
  Defined for a Group (EE) or for a Project (CE)

  +---------------------+
  | Stages              |
  |   +-------------+   |
  |   |   Stage A   |   |                                 +--------+
  |   |             |   |      +---------------+   +----> | Median |
  |   +-------------+   |      |               |   |      +--------+
  |   | Start Event | +------> | DataCollector | +-+
  |   +-------------+   |      |               |   |      +---------+
  |   | End Event   |   |      +---------------+   +----> | Records |
  |   +-------------+   |                                 +---------+
  |                     |
  |   +-------------+   |
  |   |   Stage B   |   |
  |   |             |   |
  |   +-------------+   |
  |   | Start Event |   |
  |   +-------------+   |
  |   | End Event   |   |
  |   +-------------+   |
  |                     |
  |   ...               |
  +---------------------+

Database Related Changes (compared to the old CA feature, lib/gitlab/cycle_analytics)

  • User AR instead of Arel for building the query
  • Using percentile_disc function for the median calculation
  • Not using CTE tables
  • Avoid unnecessary table joins (join tables when the Stage actually requires it)
  • Loading records with one query (IssuableFinder query is merged), which helps implementing pagination later

Try it in the Console

Review stage definition:

  • Start: merge request created (merge_requests.created_at column)
  • End: merge request merged (merge_request_metrics.merged_at column)
stage = Analytics::CycleAnalytics::ProjectStage.new(start_event_identifier: :merge_request_created, end_event_identifier: :merge_request_merged, project_id: 19)

# each supported event is represented as a Class
# stage.start_event.class => Gitlab::CycleAnalytics::StageEvents::MergeRequestCreated
# stage.end_event.class => stage.end_event.class => Gitlab::CycleAnalytics::StageEvents::MergeRequestMerged

dc = Gitlab::Analytics::CycleAnalytics::DataCollector.new(stage: stage, params: { from: 30.days.ago, current_user: User.first })
puts dc.median.seconds
puts dc.records_fetcher.serialized_records

Does this MR meet the acceptance criteria?

Conformity

Performance and Testing

Edited by Adam Hegyi

Merge request reports