Query Timeout causing 500 in Projects::MergeRequests::DiffsController#diffs_batch
Description
A customer reported intermittent 500s when viewing certain merge request diffs. Looking into this, the error is coming from: https://gitlab.com/gitlab-org/gitlab/-/blob/794618a9da801cc2739315c3ff45eb65a880cbfd/app/models/environment.rb#L245. The last_deployment
association results in a query like:
SELECT
"deployments".*
FROM "deployments"
WHERE "deployments"."environment_id" = $1
AND "deployments"."status" = 2
ORDER BY deployments.id DESC
LIMIT 1;
Tracing that back to the customer, we can see that have some environments that have a large number of deployments, but only very few with a status of 2. For example, looking at this environment:
select deployments.status, count(*)
from deployments
where environment_id = 693272
group by 1 order by 2 desc;
status | count
--------+-------
0 | 73645
5 | 13958
4 | 11406
2 | 32
3 | 26
So using that in the query from last_deployment
, we get this execution: https://explain.depesz.com/s/4ghw
Due to the very sparse number of records where status = 2
, and not having an index covering that column as well, we have to scan backwards over the index until we find one, resulting in the quite large: Rows Removed by Filter: 70,672
(uncached query execution was taking around 28s on database-lab)
Zendesk: https://gitlab.zendesk.com/agent/tickets/200958
Impact
ToDo: Describe the impacts of the code improvements, such as metrics or screenshots before and after the associated MR(s) were merged.