Optionally show in progress items in VSA
What does this MR do?
This MR adds an extra parameter to the VSA backend which changes the underlying query to filter for in progress items.
How it works today?
VSA currently works on both on project and group levels, on the group level there is extra functionality where the users can customize the filters.
Structure:
- Group can have many value streams
- A value stream can have many stages
- A stage defines a time range based on timestamp expressions
- Start event: MR created (
merge_requests.created_at
column) - End event: MR closed (
merge_request.closed_at
column) - These timestamps are used to calculate duration (end_event - start_event) used in various metrics: median, average, etc.
- Start event: MR created (
Base query example for getting duration:
SELECT (issues.closed_at - issues.created_at) as duration
FROM issues
WHERE
project_id = x AND
created_at > '2010-01-01' AND created_at < '2010-02-01'
The query only works on finished items where both columns are NOT NULL
. This MR adds an option to control the end event parameter and make it open-ended.
How to show in progress items?
Showing in progress items requires an extra condition to the query where we express that the end_event
timestamp expression must be NULL. To calculate the duration we take the current timestamp.
SELECT (CURRENT_TIMESTAMP - issues.created_at) as duration
FROM issues
WHERE
project_id = x AND
created_at > '2010-01-01' AND created_at < '2010-02-01' AND
closed_at IS NULL
Implementation
Since VSA stages are customizable (user can select the start and end event pairs), a generic solution was needed.
Each event is expressed as a class, the full list of events can be found here: https://gitlab.com/gitlab-org/gitlab/-/tree/master/lib/gitlab/analytics/cycle_analytics/stage_events
The class defines the following:
-
column_list
: what columns are part of the event. This is usually 1 column, for example:merge_requests.created_at
- Some events have two columns (legacy)
-
timestamp_projection
: this method defines the Arel SQL expression for the event, examples:merge_requests.created_at
-
COELASCE(merge_requests.updated_at, merge_requests.closed_at)
, special two column definition
-
apply_query_customization
: some columns are defined on other tables, so sometimesJOIN
is required. An event can override this method to add thejoin
or other filters if needed. -
apply_negated_query_customization
: new method, that adds thetimestamp_projection IS NULL
condition to the query when we filter for in-progress items.
To toggle between in-progress and finished items a new option was introduced: end_event_filter
.
- If
end_event_filter == :finished
, executes the original behavior. - If
end_event_filter == :in_progress
, instead ofapply_query_customization
,apply_negated_query_customization
method will be called for the end event.- The duration calculation also changes.
- I also changed the tests to use
let_it_be
so it runs faster.
Test changes
Since VSA is customizable, we have a large test file that tests various event combinations: ee/spec/lib/gitlab/analytics/cycle_analytics/data_collector_spec.rb
How it works:
- Create 3 resources (issues or MRs) with different durations.
- Assert the calculated durations in each test case.
- This MR adds 2 extra resources where the end event is missing (open ended).
How to test it
- Make sure you have ultimate license
- Go to the generated group: Analytics / Value Stream
- Create a new value stream (dropdown top right)
- Tick
Create from no template
and select start event: issue created, end event: issue closed - The page should show no items
- Create a new issue within the group (any project)
- Open the network tab
- Find the
records
API call, copy the URL and add `&end_event_filter=in_progress - The created issue should show up
If you close the issue, the issue should disappear and it should show up on the UI where the end_event_filter
is not applied.
Database
There are many query combinations, so I only picked a few. The query base is always the same: filtering the issues.created_at
or merge_requests.created_at
(1 month range) and then apply additional filters.
There is no significant difference between old and new queries, the performance is generally not great due to the recursive namespace queries.
Cached timings:
NEW | Issue stage, in progress query (130ms)
SELECT
EXTRACT(EPOCH FROM percentile_cont(0.5) WITHIN GROUP (ORDER BY TO_TIMESTAMP(1619431013) - "issues"."created_at")) AS median
FROM
"issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
INNER JOIN "issue_metrics" ON "issue_metrics"."issue_id" = "issues"."id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE (issues.confidential IS NOT TRUE
OR (issues.confidential = TRUE
AND (issues.author_id = 4156052
OR EXISTS (
SELECT
TRUE
FROM
issue_assignees
WHERE
user_id = 4156052
AND issue_id = issues.id)
OR EXISTS (
SELECT
1
FROM
"project_authorizations"
WHERE
"project_authorizations"."user_id" = 4156052
AND (project_authorizations.project_id = issues.project_id)
AND (project_authorizations.access_level >= 20)))))
AND "projects"."namespace_id" IN (
SELECT
"namespaces"."id"
FROM
"namespaces"
WHERE (traversal_ids @> ('{9970}')))
AND (EXISTS (
SELECT
1
FROM
"project_authorizations"
WHERE
"project_authorizations"."user_id" = 4156052
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 10))
OR projects.visibility_level IN (10, 20))
AND ("project_features"."issues_access_level" IS NULL
OR "project_features"."issues_access_level" IN (20, 30)
OR ("project_features"."issues_access_level" = 10
AND EXISTS (
SELECT
1
FROM
"project_authorizations"
WHERE
"project_authorizations"."user_id" = 4156052
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 10))))
AND "issues"."created_at" <= '2021-04-26'
AND "issues"."created_at" >= '2021-03-26 09:56:51.286255'
AND COALESCE("issue_metrics"."first_associated_with_milestone_at", "issue_metrics"."first_added_to_board_at") IS NULL
Plan: https://explain.depesz.com/s/ZCAv
Issue stage, original query, just for comparison (150ms): https://explain.depesz.com/s/JUlo
NEW | Plan stage, in progress query (70ms)
SELECT
EXTRACT(EPOCH FROM percentile_cont(0.5) WITHIN GROUP (ORDER BY TO_TIMESTAMP(1619431018) - COALESCE("issue_metrics"."first_associated_with_milestone_at", "issue_metrics"."first_added_to_board_at"))) AS median
FROM
"issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
INNER JOIN "issue_metrics" ON "issue_metrics"."issue_id" = "issues"."id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE (issues.confidential IS NOT TRUE
OR (issues.confidential = TRUE
AND (issues.author_id = 4156052
OR EXISTS (
SELECT
TRUE
FROM
issue_assignees
WHERE
user_id = 4156052
AND issue_id = issues.id)
OR EXISTS (
SELECT
1
FROM
"project_authorizations"
WHERE
"project_authorizations"."user_id" = 4156052
AND (project_authorizations.project_id = issues.project_id)
AND (project_authorizations.access_level >= 20)))))
AND "projects"."namespace_id" IN (
SELECT
"namespaces"."id"
FROM
"namespaces"
WHERE (traversal_ids @> ('{9970}')))
AND (EXISTS (
SELECT
1
FROM
"project_authorizations"
WHERE
"project_authorizations"."user_id" = 4156052
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 10))
OR projects.visibility_level IN (10, 20))
AND ("project_features"."issues_access_level" IS NULL
OR "project_features"."issues_access_level" IN (20, 30)
OR ("project_features"."issues_access_level" = 10
AND EXISTS (
SELECT
1
FROM
"project_authorizations"
WHERE
"project_authorizations"."user_id" = 4156052
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 10))))
AND "issues"."created_at" <= '2021-04-26'
AND "issues"."created_at" >= '2021-03-26 09:56:57.367377'
AND ("issue_metrics"."first_added_to_board_at" IS NOT NULL
OR "issue_metrics"."first_associated_with_milestone_at" IS NOT NULL)
AND "issue_metrics"."first_mentioned_in_commit_at" IS NOT NULL
AND "issue_metrics"."first_mentioned_in_commit_at" IS NULL
Plan: https://explain.depesz.com/s/tvl6
Plan stage, original query, just for comparison (130ms): https://explain.depesz.com/s/y1vU
Does this MR meet the acceptance criteria?
Conformity
-
📋 Does this MR need a changelog?-
I have included a changelog entry. -
I have not included a changelog entry because not user facing change.
-
-
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Related to #299265 (closed)