[Breaking Change] Returns an error when `updated_at` is not sorted by `update_at` in Deployment API
If you found this issue because of breaking change
Starting from 16.0 updated_before/after
can not be used without also using order_by=updated_at
and wise versa. It was deprecated long time ago, but worked until 16.0.
How to fix: just add updated_before/order_by
parameters. You can use the current time as updated_before
. See more in https://docs.gitlab.com/ee/api/deployments.html#list-project-deployments
Problem
This issue was brought up in the performance optimization issue.
Problem details
This updated_before
filter was introduced in #33101 (closed). The issue describes that:
An Enterprise customer is pulling data from the "list project deployments" API endpoint to populate a custom-built dashboard. The customer only wants to display the latest changes. However, the customer has no way to restrict the API results to the latest changes. Instead, the customer retrieves all records, checks one-by-one and processes only the records updated after the latest
updated_at
value in the last batch retrieved. The customer has asked for the ability to query forupdated_at
values after a specified timestamp.
updated_at
filtering would make sense in Issues or MRs context, however, it doesn't make sense in deployments context. For example, looking at a query that customers are currently executing:
SELECT
"deployments".*
FROM
"deployments"
INNER JOIN "environments" ON "environments"."id" = "deployments"."environment_id"
WHERE
"deployments"."project_id" = $ 1
AND "deployments"."updated_at" <= $ 2
AND "environments"."name" = $ 3
ORDER BY
"deployments"."iid" DESC
LIMIT
$ 4 OFFSET $ 5
This does NOT fetch the latest changes to the environment that the original issue was born for. But this fetches mixed states of deployment records that includes:
- Deployments that actually deployed to the target environment. This is the "changes" in deployment context.
- Deployment jobs that still running to the target environment. (It should be excluded)
- Failed deployment jobs that didn't make into the target environment. (It should be excluded)
- Canceled deployment jobs that didn't make into the target environment. (It should be excluded)
- These mixed-states deployment rows are ordered by
iid
, which is creation order, not update order (See more #29884 (comment 549584527)) (i.e. this sorting is messing upupdated_at
filtering)
In other words, end-users are seeing too much filter params in the endpoint to construct the right parameter combination.
Here is the correct query/parameter combination to fetch latest changes/deployments to a specific environment.
SELECT
"deployments".*
FROM
"deployments"
WHERE
"deployments"."environment_id" = 1069567
AND "deployments"."status" = 2
AND "deployments"."finished_at" <= '2021-04-14T09:05:29Z'
ORDER BY
"deployments"."finished_at" DESC
LIMIT
100 OFFSET 100
This query can be executed in 40ms, which is 2000x faster than the query above.
Implementation Guide
Switch to the hard error:
diff --git a/app/finders/deployments_finder.rb b/app/finders/deployments_finder.rb
index 5b2139cb941..486b7bfabab 100644
--- a/app/finders/deployments_finder.rb
+++ b/app/finders/deployments_finder.rb
@@ -59,14 +59,8 @@ def validate!
raise InefficientQueryError, 'Both `updated_at` filter and `finished_at` filter can not be specified'
end
- # Currently, the inefficient parameters are allowed in order to avoid breaking changes in Deployment API.
- # We'll switch to a hard error in https://gitlab.com/gitlab-org/gitlab/-/issues/328500.
if (filter_by_updated_at? && !order_by_updated_at?) || (!filter_by_updated_at? && order_by_updated_at?)
- error = InefficientQueryError.new('`updated_at` filter and `updated_at` sorting must be paired')
-
- Gitlab::ErrorTracking.log_exception(error)
-
- raise error if raise_for_inefficient_updated_at_query?
+ raise InefficientQueryError, '`updated_at` filter and `updated_at` sorting must be paired'
end
if filter_by_finished_at? && !order_by_finished_at?
Please note that this change could break API automation of customers/users workflow. We should notify/guide them in advance to switch to the other parameter combination before fully rolling out the change.