Measure DORA 4 Metrics in GitLab
<!-- triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION --> *This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.* <!-- triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION --> # Overview The purpose of this issue was to facilitate the discussion about how to add DORA to GitLab. The final implementation details can be found in the product [DORA documentation](https://docs.gitlab.com/ee/user/analytics/dora_metrics.html) and in the [DORA category direction page](https://about.gitlab.com/direction/plan/dora_metrics/). ### Problem to solve <!-- What problem do we solve? --> Customer experience is becoming a key metric. Users are looking for the ability to not just measure platform stability and other performance KPIs post-deployment but also want to set targets for customer behavior, experience, and financial impact. Tracking and measuring this indicators after deployment solves an important pain point. In a similar fashion, creating views which are managing products not projects or repos will provide users with a more relevant set of data. ### Intended users <!-- Who will use this feature? If known, include any of the following: types of users (e.g. Developer), personas, or specific company roles (e.g. Release Manager). It's okay to write "Unknown" and fill this field in later. * [Parker (Product Manager)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#parker-product-manager) * [Delaney (Development Team Lead)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#delaney-development-team-lead) * [Sasha (Software Developer)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#sasha-software-developer) * [Presley (Product Designer)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#presley-product-designer) * [Devon (DevOps Engineer)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#devon-devops-engineer) * [Sidney (Systems Administrator)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#sidney-systems-administrator) * [Sam (Security Analyst)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#sam-security-analyst) * [Dana (Data Analyst)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#dana-data-analyst) Personas are described at https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/ --> * [Parker (Product Manager)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#parker-product-manager) * [Dana (Data Analyst)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#dana-data-analyst) * [Devon (DevOps Engineer)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#devon-devops-engineer) ### Further details <!-- Include use cases, benefits, and/or goals (contributes to our vision?) --> ![image](/uploads/056faf2192c5c701a2ebad8d6167fc5c/image.png) There are four main metrics that the industry is currently talking about: * Deployment frequency - how often does code get pushed production (how many times a day/week/month) * Lead time for changes - how long does it take for code to be committed and reach production * Time to restore service - how long does it generally take to restore service when a service incident or a defect that impacts users occurs (can be rollback or time to solve a specific bug) * Change failure rate - what percentage of changes to production or released to users result in degraded service (generally requiring a rollback or hotfix/patch) In the **future** the dashboards will be consolidated under the _analytics_ dashboard as follows: We should provide a dashboard that shows these metrics and allows to drill down to understand what specific issue caused the outage/long delay to production etc. The dashboard shall be called: `Business Metrics ` The dashboard should be a sub dashboard under the existing metrics dashboard but should also get a menu option under operations which will open the previous sub menu: | | | | ------ | ------ | | ![image](/uploads/37ed2f4c14393514bc9946a99f3ea831/image.png) | ![image](/uploads/a5e7475bb6cf8cba246eac7614375acf/image.png) | ### GitHub future plans ![image](/uploads/dcd9818c93f5634f993dc5a6366ae228/image.png) ## UX **:star: [`SEE PROTOTYPES ON FIGMA`](https://www.figma.com/file/DvT2jk9x5fcLKmrY8m2hlc/Measure-DORA-4-Metrics-in-GitLab?node-id=379%3A0) :star:** ### Proposal <!-- How are we going to solve the problem? Try to include the user journey! https://about.gitlab.com/handbook/journeys/#user-journey --> We will place the new charts under: https://gitlab.com/gitlab-org/gitlab/pipelines/charts ### MVC Under CI/CD metrics we will display: * Deployment frequency (1st iteration) * Lead Time for changes The MVC includes showing these values at an instance level and in a graph view so that the trend can be observed. In the spirit of iteration we may start by building by project level and group level, but our ultimate goal is to present on the instance level. ### Not in the MVC * Ability to sort/filter by label or milestone/release * Additional metrics * Ability to add annotations * Ability to select custom dates * Time to restore service - how long does it generally take to restore service when a service incident or a defect that impacts users occurs (can be rollback or time to solve a specific bug) * Change failure rate - what percentage of changes to production or released to users result in degraded service (generally requiring a rollback or hotfix/patch) ### Tiering Strategy MVC will start with everything in ~"GitLab Ultimate" Future * ~"GitLab Core" Would get the single tile view metrics of DORA4 (which we aren't really working on at the moment) | DORA 4 Widget for Core | |---| |![project_analytics_cicd](/uploads/26e56d43e9024920dd8e9c100652e96c/project_analytics_cicd.png) * ~"GitLab Premium" Would get the project level graphs and API | DORA 4 Project View for Premium | |---| |![project_analytics_cicd_DORA4_TAB](/uploads/cc386f62123b741a85e39f6345792c14/project_analytics_cicd_DORA4_TAB.png)| * ~"GitLab Ultimate" Would get the group/instance level graphs and API | DORA 4 Group View for Ultimate | |---| |![group_analytics_cicd_DORA4_TAB](/uploads/418e6096d583d8720ea95f4774df5a92/group_analytics_cicd_DORA4_TAB.png)| |Note: Ultimate teir users see the full project level DORA4 Metrics visible for the premium Tier. In addition, ultimate customers get DORA4 metrics at the group level | ### Permissions and Security <!-- What permissions are required to perform the described actions? Are they consistent with the existing permissions as documented for users, groups, and projects as appropriate? Is the proposed behavior consistent between the UI, API, and other access methods (e.g. email replies)?--> ### Documentation <!-- See the Feature Change Documentation Workflow https://docs.gitlab.com/ee/development/documentation/feature-change-workflow.html Add all known Documentation Requirements here, per https://docs.gitlab.com/ee/development/documentation/feature-change-workflow.html#documentation-requirements If this feature requires changing permissions, this document https://docs.gitlab.com/ee/user/permissions.html must be updated accordingly. --> https://about.gitlab.com/handbook/marketing/product-marketing/devops-metrics/#step-1-dora-metrics Draft documentation from @djensen: <details> ### What is DORA? DORA stands for "DevOps Research and Assessment". DORA is a research organization that has been studying and reporting on DevOps best practices since 2013. It was [acquired by Google in 2018](https://www.devops-research.com/dora-joins-google-cloud.html). Its current home is within [Google Cloud DevOps](https://cloud.google.com/devops/). ### What are DORA metrics? The phrase "DORA metrics" refers to DORA's 4 "core metrics — commit-to-prod lead time, deployment frequency, change failure rate and mean time to restore" ([TheNewStack](https://thenewstack.io/dora-2019-devops-efforts-improving-but-not-done/)). 1. Throughput performance metrics 1. Deployment Frequency (DF). Defines how often your organization deploys code to production. Elite performers deploy on-demand multiple times a day. [Measured with Deployments to Production] 1. Lead Time for Changes / Mean Lead Time (MLT). Defines how long it takes for a code commit to be deployed to production. Elite performers have a lead time of less than an hour. [Measure with time from commit/merge to deployment to Production] 1. Stability performance metrics 1. Time to restore service / Mean Time To Recover (MTTR). Defines how long it takes to restore the service after a service incident occurred. Elite performers restore service in less than an hour. [] 1. Change Failure Rate (CFR). Defines what percentage of changes result either in degraded service or subsequently require remediation (e.g. leads to impairment or outage, requires hotfix, rollback, fix forward). Elite performance have a change failure rate between 0% and 15%." Source: [CloudBees](https://www.cloudbees.com/blog/2018-accelerate-state-devops-report-identifies-elite-performers) These metrics were popularized by the founder of DORA in her book *Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations* ([Amazon](https://www.amazon.com/gp/product/1942788339/ref=as_li_tl?ie=UTF8&tag=itrevpre-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=1942788339&linkId=2a979f6ff63a8c7c5892e4b2858d8480)). ### Why care about DORA metrics? "Many EDs mentioned DORA standards." - [12.5 Analytics Research](https://docs.google.com/presentation/d/13mr-6CLhQJicjcYjcxme89PtBR97rWZ4GVZMpJSIYHw/edit#slide=id.g6bb901c580_0_60) "DORA metrics are a result of [at least] six years worth of surveys conducted by the DevOps Research and Assessments (DORA) team ... These metrics guide determine how successful a company is at DevOps". [CloudBees](https://www.cloudbees.com/blog/2018-accelerate-state-devops-report-identifies-elite-performers) "Compared to teams in the low-performing group, these [DORA metric] elite software teams: - Execute 208 times as many code deployments - Maintain lead times, from commit to deploy, that are 106 times faster - Report change failure rates that are 7 times lower - Recover from change failures 2,604 times faster As important as these advantages are, here’s an even more impressive metric: elite performers are twice as likely as low-performing teams to meet or exceed their organizational performance goals." [NewRelic](https://blog.newrelic.com/technology/dora-accelerate-state-of-devops-2019/) ### How to measure DORA metrics in GitLab? As noted by [ThoughtWorks](https://www.thoughtworks.com/radar/techniques/four-key-metrics), "A good place to start is to instrument the build pipelines so you can capture the four key metrics and make the software delivery value stream visible. GoCD pipelines, for example, provide the ability to measure these four key metrics as a first-class citizen of the GoCD analytics." Here is how GitLab measures the 4 DORA core metrics: 1. Deployment Frequency (DF). 1. Measure: Number of Deployments to production Environment per day. 1. Lead Time for Changes / Mean Lead Time (MLT). 1. Measure: Average time between "commit added to default branch" and "commit deployed to production Environment" 1. Question: Exclude commits "internal" to an MR? For example, if an MR has 3 commits, should we only consider the last commit (whether ordinary or merge commit)? 1. Time to restore service / Mean Time To Recover (MTTR). 1. Measure: ? 1. Change Failure Rate (CFR). 1. Measure: Number of "bug" tickets versus other Issues? 1. Question: What about bugs that are fixed without an Issue being created? 1. Question: How do we identify "bug" issues? Ask the user to submit a label? 1. Question: Bugs generally lag features. Should we measure "this month's bugs" against "last month's non-bugs". </details> ### Testing <!-- What risks does this change pose? How might it affect the quality of the product? What additional test coverage or changes to tests will be needed? Will it require cross-browser testing? See the test engineering process for further help: https://about.gitlab.com/handbook/engineering/quality/test-engineering/ --> ### What does success look like, and how can we measure that? <!-- Define both the success metrics and acceptance criteria. Note that success metrics indicate the desired business outcomes, while acceptance criteria indicate when the solution is working correctly. If there is no way to measure success, link to an issue that will implement a way to measure this. --> ### What is the type of buyer? <!-- Which leads to: in which enterprise tier should this feature go? See https://about.gitlab.com/handbook/product/pricing/#four-tiers --> ### Links / reference - https://gitlab.my.salesforce.com/0016100000KvahJ - https://gitlab.my.salesforce.com/00161000008ptPz - https://gitlab.my.salesforce.com/001610000111bA3AAI - https://github.com/github/roadmap/issues/127 YouTube: https://youtu.be/iwefADYMe_s ### Disclaimer This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features or functionality remain at the sole discretion of GitLab Inc.
epic