New MR widget to catch CI jobs' duration regression before and MR is merged into the main branch
Release notes
This feature allows MR authors & reviewers to catch CI jobs' duration regression before and MR is merged into the main branch, saving CI & engineering cost, as well as engineers' time.
Problem to solve
This feature proposal comes after Engineering Productivity faced #341757 (closed) and would have helped us catch the problem sooner (hopefully before hitting the main branch).
CI jobs durations are usually stable over time, and mostly grow linearly and slowly with the growth of the application they're testing.
This can be seen in one of our own gitlab-org/gitlab
job here (don't pay attention to the decrease that happened in June 2021):
That being said, sometimes a change in the application can have a big impact on a job's duration and there is currently no easy way to notice such regression in GitLab, other than manually looking at a job/pipeline duration and compare it to a known value.
This can be seen in one of our own gitlab-org/gitlab
job which suddenly took 2 more minutes to finish on 2021-09-22:
The sad part here is that we were fortunate enough to have charts about our jobs' duration in an external tool, but we only noticed the regression one week after it was merged into the main branch.
This means people with no jobs' duration monitoring tool probably wouldn't have noticed this at all.
Proposal
The proposal is pretty simple in essence:
- Compute P80 duration of each job on the main branch for the last 3 days (could be 1 day, could be 7 days, could be configured?)
- In MR, compare each job's duration to the main branch P80 durations
- Show any suspicious regressions in the MR widget (the threshold could be configured, defaulting to 20%)
Intended users
- Delaney (Development Team Lead)
- Sasha (Software Developer)
- Devon (DevOps Engineer)
- Alex (Security Operations Engineer)
- Simone (Software Engineer in Test)
- Allison (Application Ops)
- Priyanka (Platform Engineer)
User experience goal
The MR author/reviewer should be able to detect CI duration regressions introduced by an MR, directly from the MR widget.
Further details
A vaguely related feature is the "CI/CD Analytics" page: https://gitlab.com/gitlab-org/gitlab/-/pipelines/charts, which currently only gives "Pipeline durations for the last 30 commits".
Permissions and Security
No specific permissions would need to be created:
- the potential duration regression would be shown in the MR widget, so the existing permission would apply
- the settings for the feature would be set similarly to other project settings, so the existing permission would apply
Documentation
See the Feature Change Documentation Workflow https://docs.gitlab.com/ee/development/documentation/workflow.html#for-a-product-change
- Add all known Documentation Requirements in this section. See https://docs.gitlab.com/ee/development/documentation/workflow.html
- If this feature requires changing permissions, update the permissions document. See https://docs.gitlab.com/ee/user/permissions.html
Availability & Testing
This section needs to be retained and filled in during the workflow planning breakdown phase of this feature proposal, if not earlier.
What risks does this change pose to our availability? How might it affect the quality of the product? What additional test coverage or changes to tests will be needed? Will it require cross-browser testing?
Please list the test areas (unit, integration and end-to-end) that needs to be added or updated to ensure that this feature will work as intended. Please use the list below as guidance.
- Unit test changes
- Integration test changes
- End-to-end test change
See the test engineering planning process and reach out to your counterpart Software Engineer in Test for assistance: https://about.gitlab.com/handbook/engineering/quality/test-engineering/#test-planning
What does success look like, and how can we measure that?
Success would mean that CI duration regressions would be detected before they're merged into the main branch, avoiding:
- additional CI costs
- additional wall-clock time for engineers waiting longer for their jobs/pipelines to finish
- additional engineer-hours spent on debugging the problem after the merge
What is the type of buyer?
What is the buyer persona for this feature? See https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/buyer-persona/ In which enterprise tier should this feature go? See https://about.gitlab.com/handbook/product/pricing/#three-tiers
Is this a cross-stage feature?
Communicate if this change will affect multiple Stage Groups or product areas. We recommend always start with the assumption that a feature request will have an impact into another Group. Loop in the most relevant PM and Product Designer from that Group to provide strategic support to help align the Group's broader plan and vision, as well as to avoid UX and technical debt. https://about.gitlab.com/handbook/product/#cross-stage-features -->