Skip to content

ActiveRecord::QueryCanceled: PG::QueryCanceled: ERROR: canceling statement due to statement timeout

Description

Sidekiq/Security::Findings::DeleteByJobIdWorker is failing due to Postgres queries timing out while querying security findings.

The Security::Finding model is historically prone to slow queries due to the mass of data contained within. ("~15M records created 90 days ago"). As a result queries against the security findings need to be optimised, potentially through the reduction of the batch size. Additionally it may be preferable to introduce jitter and/or delay to these jobs so that they occur over the course of the day or weekend, mitigating query spikes against the Postgres database.

Implementation plan

Example usage of keyset iterator
diff --git a/ee/app/services/security/findings/cleanup_service.rb b/ee/app/services/security/findings/cleanup_service.rb
index 25b01ec1d47..e98758d3840 100644
--- a/ee/app/services/security/findings/cleanup_service.rb
+++ b/ee/app/services/security/findings/cleanup_service.rb
@@ -39,10 +39,16 @@ def purge(scan_batch)
       end

       def delete_findings_for(scan_batch)
-        Security::Finding.by_scan(scan_batch).each_batch(of: BATCH_DELETE_SIZE) do |finding_batch|
+        iterator_for(scan_batch).each_batch(of: BATCH_DELETE_SIZE) do |finding_batch|
           finding_batch.delete_all
         end
       end
+
+      def iterator_for(scan_batch)
+        Security::Finding.by_scan(scan_batch).order(:scan_id, :id).then do |scope|
+          Gitlab::Pagination::Keyset::Iterator.new(scope: scope)
+        end
+      end
     end
   end
 end

Sentry Error

https://sentry.gitlab.net/gitlab/gitlabcom/issues/3229361/?referrer=gitlab_plugin

PG::QueryCanceled: ERROR:  canceling statement due to statement timeout

  lib/gitlab/database/load_balancing/connection_proxy.rb:99:in `block in read_using_load_balancer'
    connection.send(...)
  lib/gitlab/database/load_balancing/load_balancer.rb:112:in `block in read_write'
    yield connection
  lib/gitlab/database/load_balancing/load_balancer.rb:179:in `retry_with_backoff'
    return yield
  lib/gitlab/database/load_balancing/load_balancer.rb:110:in `read_write'
    retry_with_backoff do
  lib/gitlab/database/load_balancing/connection_proxy.rb:98:in `read_using_load_balancer'
    @load_balancer.read_write do |connection|
...
(147 additional frame(s) were not displayed)

ActiveRecord::QueryCanceled: PG::QueryCanceled: ERROR:  canceling statement due to statement timeout

PG::QueryCanceled: ERROR:  canceling statement due to statement timeout

Example of Failing Query

/*application:sidekiq,correlation_id:<redacted>,jid:<redacted>,endpoint_id:Security::Findings::DeleteByJobIdWorker,db_config_name:main*/ SELECT "security_findings"."id" FROM "security_findings" WHERE "security_findings"."scan_id" IN (SELECT "security_scans"."id" FROM "security_scans" WHERE "security_scans"."build_id" IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, $29, $30, $31, $32, $33, $34, $35, $36, $37, $38, $39, $40, $41, $42, $43, $44, $45, $46, $47, $48, $49, $50, $51, $52, $53, $54, $55, $56, $57, $58, $59, $60, $61, $62, $63, $64, $65, $66, $67, $68, $69, $70, $71, $72, $73, $74, $75, $76, $77, $78, $79, $80, $81, $82, $83, $84, $85, $86, $87, $88, $89, $90, $91, $92, $93, $94, $95, $96, $97, $98, $99, $100) AND "security_scans"."id" = $101) ORDER BY "security_findings"."id" ASC LIMIT $102 
Edited by Mehmet Emin INAC