Implement new query backend for Cycle Analytics
What does this MR do?
- Building query to calculate median and extract relevant records.
- Note 1: this change is not user facing
- Note 2: this MR is part of a bigger feature, previous MR: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/31713
- Feature docs: https://about.gitlab.com/product/cycle-analytics/
- Related issue: https://gitlab.com/gitlab-org/gitlab-ee/issues/12196
What is Cycle Analytics
- Find all the
Issue
orMergeRequest
records matching with a date range query (start_event
andend_event
) =Stage
. - Calculate the duration (
end_event_time
-start_event_time
) - Extract the median duration
- Extract the list of records relevant to the date range
- In EE cycle analytics stages will be customizable. In CE we only provide the default 7 stages.
High Level Overview
-
DataCollector
is the high level interface for the feature. -
BaseQueryBuilder
is responsible for providing the base query, joining the absolutely necessary tables and do high level filtering. - An
Event
(start, end) could alter the query (join additional tables when needed,apply_query_customization
). It defines a timestamp expression that will be used for the duration calculation. -
Median
andRecords
are using the base query provided by theDataCollector
and do additional query manipulation.
Defined for a Group (EE) or for a Project (CE)
+---------------------+
| Stages |
| +-------------+ |
| | Stage A | | +--------+
| | | | +---------------+ +----> | Median |
| +-------------+ | | | | +--------+
| | Start Event | +------> | DataCollector | +-+
| +-------------+ | | | | +---------+
| | End Event | | +---------------+ +----> | Records |
| +-------------+ | +---------+
| |
| +-------------+ |
| | Stage B | |
| | | |
| +-------------+ |
| | Start Event | |
| +-------------+ |
| | End Event | |
| +-------------+ |
| |
| ... |
+---------------------+
Database Related Changes (compared to the old CA feature, lib/gitlab/cycle_analytics)
- User AR instead of Arel for building the query
- Using
percentile_disc
function for the median calculation - Not using CTE tables
- Avoid unnecessary table joins (join tables when the
Stage
actually requires it) - Loading records with one query (
IssuableFinder
query is merged), which helps implementing pagination later
Try it in the Console
Review stage definition:
- Start: merge request created (
merge_requests.created_at
column) - End: merge request merged (
merge_request_metrics.merged_at
column)
stage = Analytics::CycleAnalytics::ProjectStage.new(start_event_identifier: :merge_request_created, end_event_identifier: :merge_request_merged, project_id: 19)
# each supported event is represented as a Class
# stage.start_event.class => Gitlab::CycleAnalytics::StageEvents::MergeRequestCreated
# stage.end_event.class => stage.end_event.class => Gitlab::CycleAnalytics::StageEvents::MergeRequestMerged
dc = Gitlab::Analytics::CycleAnalytics::DataCollector.new(stage: stage, params: { from: 30.days.ago, current_user: User.first })
puts dc.median.seconds
puts dc.records_fetcher.serialized_records
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation created/updated or follow-up review issue created -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Performance and Testing
Edited by Adam Hegyi