Skip to content

API support for Lead time for MRs to be deployed to production

Release notes

The user should be able to use the API to present the lead time data in an external tool.

This feature is GitLab Ultimate

Problem to solve

As an executive investing in DevOps, I want to see my ROI. I want to see an improvement of my dev team in the time takes to MRs to be deployed to production . This will most likely drive the KPIs of my team.

As a developer leader I want to see that the team is improving by measuring the lead time for MRs to be deployed to production and comparing it over time (over sprints)

I want to import the data from GitLab into an external monitoring tool, so I need API support to poll the data.

Intended users

User experience goal

The user should be able to use the API to present the lead time data in an external tool.

Proposal

We should show the median time for merged MRs to production

  • REST API and backend service to perform calculations
  • REST API documentation

Note: GraphQL equivalent will be handled separately in #323966 (closed)

We need to support GET API for this. As parameters there should be a configurable timestamp - the default time, if not set should be 30 days.

For this iteration the API should return the median time for merged MRs for the project scope, but in the future we will add group level and instance level support. So this should be taken into consideration when developing this API.

Also in the future we would want to add support for the environment that there was a merge for. In the true sense of DORA 4 - this is only meant for production, but there may be more than one prod environment and/or users may interested in non-prod environments

Further details

There is another consideration: some additional work to create a data structure (tables) to store the metrics, and I think this should be designed so it is flexible enough to:

  • commonly capture the data for all metrics
  • support a view at Project or any level of Group
  • support a minimal time slice which could then be aggregated

This is one metric out of 4 so there needs to be a common API structure to meet all needs.

Technical Proposal

This techncial proposal is to accomplish the DORA4 metrics related issues scheduled in 13.10 (Namely Deployment anaylytics and Lead Time for Changes). It provides a mid-term solution that allows any users/projects to use the dora4 metrics feature at scale regardless of project or group level.

This also partially resolves First class concept for production environments.

Current problems

  • In a mid-sized project (or above), the database query takes about 10 sec to complete (w/ cold cache), which violates Query performance guidelines.
  • The data aggregation happens in Ruby. This puts up additional 40-50 seconds. Ruby is not performant at compute-intensive process.
  • The data aggregation happens for each API request even though the same data is presented regardless of the user roles.
  • Given web nodes (puma) on production timeouts in 60 seconds, some API requests often result in a timeout error.

Approach

We should persist pre-aggreagated data into the database table and present it directly (for frontend and external systems). This is a similar approach with the daily coverage metrics feature.

Database table design

'environments' table                        - Existing table

id integer NOT NULL,
project_id integer NOT NULL,
name character varying NOT NULL,
created_at timestamp without time zone,
updated_at timestamp without time zone,
external_url character varying,
environment_type character varying,
state character varying DEFAULT 'available'::character varying NOT NULL,
slug character varying NOT NULL,
auto_stop_at timestamp with time zone,
auto_delete_at timestamp with time zone
tier (enum)                                  - New column. One of 'production', 'staging', 'review-app', etc
                                             - Sadly `environment_type` is already used for foldering feature, so we name it to `tier`.
'cd_daily_deployments_metrics' table          - New table

group_id (Foreign key, cascading deletion, not null)
project_id (Foreign key, cascading deletion, not null)
environment_tier (not null)
date (date, not null) ... e.g. 2021-01-01
deployments_count (integer)
total_lead_time_for_changes (integer - unit: second)

Unique index on (project_id, environment_tier, date)
Unique index on (group_id, environment_tier, date)
Unique index on (group_id, project_id)
'deployment_merge_requests' table             - Existing table

deployment_id integer NOT NULL,
merge_request_id integer NOT NULL,
environment_id integer,
lead_time_for_changes integer                 - New column. How long an merged MR took to be deployed

Pre-aggregation in the deployment life cycle

  1. A deployment succeeded.
  2. Deployments::UpdateEnvironmentWorker runs.
    1. environments.tier is automatically set if it's null.
  3. Deployments::LinkMergeRequestsService runs.
    1. deployment_merge_requests.lead_time_for_changes is set based on the merge_request.merged_at - deployment.deployed_at.
  4. Schedule Deployments::UpdateDailyDeploymentsMetricsWorker in 10 minutes. (This is to ensure that above async tasks have already finished)
    1. Upsert a row in cd_daily_deployments_metrics. It basically counts up the deployments_count and total_lead_time_for_changes.

Database queries

Project-level:

Deployment frequencies

SELECT deployments_count FROM cd_daily_deployments_metrics
WHERE project_id = 1 AND
  environment_tier = 1 AND
  date > '2021-01-01' AND
  date < '2021-02-23'
ORDER BY date ASC

Lead time for changes

SELECT total_lead_time_for_changes FROM cd_daily_deployments_metrics
WHERE project_id = 1 AND
  environment_tier = 1 AND
  date > '2021-01-01' AND
  date < '2021-02-23'
ORDER BY date ASC
Group-level:

Deployment frequencies

SELECT date, SUM(deployments_count) FROM cd_daily_deployments_metrics
WHERE group_id = 1 AND
  environment_tier = 1 AND
  date > '2021-01-01' AND
  date < '2021-02-23'
GROUP BY date
ORDER BY date ASC

Lead time for changes

SELECT date, SUM(total_lead_time_for_changes) FROM cd_daily_deployments_metrics
WHERE group_id = 1 AND
  environment_tier = 1 AND
  date > '2021-01-01' AND
  date < '2021-02-23'
GROUP BY date
ORDER BY date ASC

Permissions

  • At project-level, Developer role (or above) can access the data.
  • At group-level, Developer role (or above) can access the data. This is to ensure that the actor definitely has read permission to all subsequent projects.

Estimated weights

TBA (Should be achievable in 3 weeks)

Permissions and Security

Documentation

Availability & Testing

What does success look like, and how can we measure that?

What is the type of buyer?

Is this a cross-stage feature?

Links / references

Edited by Nicole Williams