Security approval rules does not work when same policy project is linked to both group and project belonging to the group

Why are we doing this work

The exception occurs when a same policy project is linked to both group and a project belonging to the group. So there will be a race-condition in the transaction in Security::ProcessScanResultPolicyWorker

Let's say a policy project P is linked to Group A and Project A. Group A has another project B then this are the steps after a scan result policy is created/updated:

graph TD
    A[PostMergeService] -->|Project A| B(Security::SyncScanPoliciesWorker)
    A[PostMergeService] -->|Group A| C(Security::SyncScanPoliciesWorker)
    B -->|Project A| D[Security::ProcessScanResultPolicyWorker]
    C -->|Project A| E[Security::ProcessScanResultPolicyWorker]
    C -->|Project B| F[Security::ProcessScanResultPolicyWorker]
    D --> G[Race Condition]
    E --> G[Race Condition]
  

Logs: Kibana

Exception

https://sentry.gitlab.net/gitlab/gitlabcom/issues/4043761/?referrer=gitlab_plugin

PG::TRDeadlockDetected: ERROR:  deadlock detected
DETAIL:  Process 4060327 waits for ShareLock on transaction 251411987; blocked by process 4021075.
Process 4021075 waits for ShareLock on transaction 251411792; blocked by process 4060327.
HINT:  See server log for query details.
CONTEXT:  while updating tuple (645229,35) in relation "approval_merge_request_rules"

  lib/gitlab/database/load_balancing/connection_proxy.rb:120:in `block in write_using_load_balancer'
    connection.send(...)
  lib/gitlab/database/load_balancing/load_balancer.rb:129:in `block in read_write'
    yield connection
  lib/gitlab/database/load_balancing/load_balancer.rb:207:in `retry_with_backoff'
    return yield attempt # Yield the current attempt count
  lib/gitlab/database/load_balancing/load_balancer.rb:118:in `read_write'
    retry_with_backoff(attempts: attempts) do |attempt|
  lib/gitlab/database/load_balancing/connection_proxy.rb:119:in `write_using_load_balancer'
    @load_balancer.read_write do |connection|
...
(216 additional frame(s) were not displayed)

ActiveRecord::Deadlocked: PG::TRDeadlockDetected: ERROR:  deadlock detected
DETAIL:  Process 4060327 waits for ShareLock on transaction 251411987; blocked by process 4021075.
Process 4021075 waits for ShareLock on transaction 251411792; blocked by process 4060327.
HINT:  See server log for query details.
CONTEXT:  while updating tuple (645229,35) in relation "approval_merge_request_rules"

PG::TRDeadlockDetected: ERROR:  deadlock detected
DETAIL:  Process 4060327 waits for ShareLock on transaction 251411987; blocked by process 4021075.
Process 4021075 waits for ShareLock on transaction 251411792; blocked by process 4060327.
HINT:  See server log for query details.
CONTEXT:  while updating tuple (645229,35) in relation "approval_merge_request_rules"

Implementation plan

  • backend Update Security::ProcessScanResultPolicyWorker to use exclusive lease so that only one instance of worker is executed at a given point of time:
diff --git a/ee/app/workers/security/process_scan_result_policy_worker.rb b/ee/app/workers/security/process_scan_result_policy_worker.rb
index 7ea9b874e9ef..e9d4cd778b53 100644
--- a/ee/app/workers/security/process_scan_result_policy_worker.rb
+++ b/ee/app/workers/security/process_scan_result_policy_worker.rb
@@ -3,6 +3,10 @@
 module Security
   class ProcessScanResultPolicyWorker # rubocop:disable Scalability/IdempotentWorker
     include ApplicationWorker
+    include Gitlab::ExclusiveLeaseHelpers
+
+    LEASE_TTL = 5.minutes
+    LEASE_NAMESPACE = "process_scan_result_policy_worker"
 
     data_consistency :always
 
@@ -16,25 +20,31 @@ def perform(project_id, configuration_id)
 
       return unless project && configuration
 
-      configuration.transaction do
-        configuration.delete_scan_finding_rules_for_project(project_id)
+      in_lock(lease_key(project, configuration), ttl: LEASE_TTL) do
+        configuration.transaction do
+          configuration.delete_scan_finding_rules_for_project(project_id)
+
+          active_scan_result_policies = configuration.active_scan_result_policies
 
-        active_scan_result_policies = configuration.active_scan_result_policies
+          active_scan_result_policies.each_with_index do |policy, policy_index|
+            Security::SecurityOrchestrationPolicies::ProcessScanResultPolicyService
+              .new(project: project, policy_configuration: configuration, policy: policy, policy_index: policy_index)
+              .execute
+          end
 
-        active_scan_result_policies.each_with_index do |policy, policy_index|
-          Security::SecurityOrchestrationPolicies::ProcessScanResultPolicyService
-            .new(project: project, policy_configuration: configuration, policy: policy, policy_index: policy_index)
+          Security::SecurityOrchestrationPolicies::SyncOpenedMergeRequestsService
+            .new(project: project)
             .execute
         end
 
-        Security::SecurityOrchestrationPolicies::SyncOpenedMergeRequestsService
+        Security::SecurityOrchestrationPolicies::SyncOpenMergeRequestsHeadPipelineService
           .new(project: project)
           .execute
       end
+    end
 
-      Security::SecurityOrchestrationPolicies::SyncOpenMergeRequestsHeadPipelineService
-        .new(project: project)
-        .execute
+    def lease_key(project, configuration)
+      "#{LEASE_NAMESPACE}:#{project.id}:#{configuration.id}"
     end
   end
 end

Verification steps:

  • Create a policy project and link it to a project and to the project's group.
  • Create a scan result policy to the policy project
  • Create MR to the target project that violates the rule in policy and notice the logs in sidekiq
Edited by Sashi Kumar Kumaresan