Skip to content

Query Timeout causing 500 in Projects::MergeRequests::DiffsController#diffs_batch

Description

A customer reported intermittent 500s when viewing certain merge request diffs. Looking into this, the error is coming from: https://gitlab.com/gitlab-org/gitlab/-/blob/794618a9da801cc2739315c3ff45eb65a880cbfd/app/models/environment.rb#L245. The last_deployment association results in a query like:

SELECT
  "deployments".*
FROM "deployments"
WHERE "deployments"."environment_id" = $1
  AND "deployments"."status" = 2
ORDER BY deployments.id DESC
LIMIT 1;

Tracing that back to the customer, we can see that have some environments that have a large number of deployments, but only very few with a status of 2. For example, looking at this environment:

select deployments.status, count(*)
from deployments
where environment_id = 693272
group by 1 order by 2 desc;
 status | count
--------+-------
      0 | 73645
      5 | 13958
      4 | 11406
      2 |    32
      3 |    26

So using that in the query from last_deployment, we get this execution: https://explain.depesz.com/s/4ghw

Due to the very sparse number of records where status = 2, and not having an index covering that column as well, we have to scan backwards over the index until we find one, resulting in the quite large: Rows Removed by Filter: 70,672 (uncached query execution was taking around 28s on database-lab)

Sentry: https://sentry.gitlab.net/gitlab/gitlabcom/issues/2536476/?query=correlation_id%3A01F1CS2B2B3VJ34CMZWTPW7VAQ

Zendesk: https://gitlab.zendesk.com/agent/tickets/200958

Impact

ToDo: Describe the impacts of the code improvements, such as metrics or screenshots before and after the associated MR(s) were merged.

Edited by Chun Du