ActiveRecord::QueryCanceled: PG::QueryCanceled: ERROR: canceling statement due to statement timeout
Description
Sidekiq/Security::Findings::DeleteByJobIdWorker
is failing due to Postgres queries timing out while querying security findings.
The Security::Finding model is historically prone to slow queries due to the mass of data contained within. ("~15M records created 90 days ago"). As a result queries against the security findings need to be optimised, potentially through the reduction of the batch size. Additionally it may be preferable to introduce jitter and/or delay to these jobs so that they occur over the course of the day or weekend, mitigating query spikes against the Postgres database.
Implementation plan
- Use keyset iterator to speed up iterating security_findings records here: https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/services/security/findings/cleanup_service.rb#L42
Example usage of keyset iterator
diff --git a/ee/app/services/security/findings/cleanup_service.rb b/ee/app/services/security/findings/cleanup_service.rb
index 25b01ec1d47..e98758d3840 100644
--- a/ee/app/services/security/findings/cleanup_service.rb
+++ b/ee/app/services/security/findings/cleanup_service.rb
@@ -39,10 +39,16 @@ def purge(scan_batch)
end
def delete_findings_for(scan_batch)
- Security::Finding.by_scan(scan_batch).each_batch(of: BATCH_DELETE_SIZE) do |finding_batch|
+ iterator_for(scan_batch).each_batch(of: BATCH_DELETE_SIZE) do |finding_batch|
finding_batch.delete_all
end
end
+
+ def iterator_for(scan_batch)
+ Security::Finding.by_scan(scan_batch).order(:scan_id, :id).then do |scope|
+ Gitlab::Pagination::Keyset::Iterator.new(scope: scope)
+ end
+ end
end
end
end
Sentry Error
https://sentry.gitlab.net/gitlab/gitlabcom/issues/3229361/?referrer=gitlab_plugin
PG::QueryCanceled: ERROR: canceling statement due to statement timeout
lib/gitlab/database/load_balancing/connection_proxy.rb:99:in `block in read_using_load_balancer'
connection.send(...)
lib/gitlab/database/load_balancing/load_balancer.rb:112:in `block in read_write'
yield connection
lib/gitlab/database/load_balancing/load_balancer.rb:179:in `retry_with_backoff'
return yield
lib/gitlab/database/load_balancing/load_balancer.rb:110:in `read_write'
retry_with_backoff do
lib/gitlab/database/load_balancing/connection_proxy.rb:98:in `read_using_load_balancer'
@load_balancer.read_write do |connection|
...
(147 additional frame(s) were not displayed)
ActiveRecord::QueryCanceled: PG::QueryCanceled: ERROR: canceling statement due to statement timeout
PG::QueryCanceled: ERROR: canceling statement due to statement timeout
Example of Failing Query
/*application:sidekiq,correlation_id:<redacted>,jid:<redacted>,endpoint_id:Security::Findings::DeleteByJobIdWorker,db_config_name:main*/ SELECT "security_findings"."id" FROM "security_findings" WHERE "security_findings"."scan_id" IN (SELECT "security_scans"."id" FROM "security_scans" WHERE "security_scans"."build_id" IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, $29, $30, $31, $32, $33, $34, $35, $36, $37, $38, $39, $40, $41, $42, $43, $44, $45, $46, $47, $48, $49, $50, $51, $52, $53, $54, $55, $56, $57, $58, $59, $60, $61, $62, $63, $64, $65, $66, $67, $68, $69, $70, $71, $72, $73, $74, $75, $76, $77, $78, $79, $80, $81, $82, $83, $84, $85, $86, $87, $88, $89, $90, $91, $92, $93, $94, $95, $96, $97, $98, $99, $100) AND "security_scans"."id" = $101) ORDER BY "security_findings"."id" ASC LIMIT $102
Edited by Mehmet Emin INAC