Add dynamic concurrency limit for create pipeline worker
What does this MR do and why?
This issue is related to the epic &13997 (closed).
Our goal is to provide an way for instance admins to manage the number of jobs executed on behalf of a scheduled scan execution policy, so that pipelines are distributed and do not overburden the runners.
This MR adds a custom concurrency limit for CreatePipelineWorker to improve our previous solution that used the concurrency_limit alone.
Our concerns about using the concurrency_limit for our case are:
-
It limits the worker's concurrency, but we want to restrict the CI builds' concurrency. One worker will create one pipeline, but the pipeline can have multiple
ci buildjobs. -
Sidekiq jobs seem to be much faster than pipeline jobs executed by the runners, so limiting the number of workers with the
concurrency_limitmight not be enough to reduce the runner's pressure.
This solution has some limitations, but it is an improvement compared to using the worker concurrency_limit attribute alone.
Database query
SELECT
"p_ci_builds"."status",
"p_ci_builds"."finished_at",
"p_ci_builds"."created_at",
"p_ci_builds"."updated_at",
"p_ci_builds"."started_at",
"p_ci_builds"."coverage",
"p_ci_builds"."name",
"p_ci_builds"."options",
"p_ci_builds"."allow_failure",
"p_ci_builds"."stage",
"p_ci_builds"."stage_idx",
"p_ci_builds"."tag",
"p_ci_builds"."ref",
"p_ci_builds"."type",
"p_ci_builds"."target_url",
"p_ci_builds"."description",
"p_ci_builds"."erased_at",
"p_ci_builds"."artifacts_expire_at",
"p_ci_builds"."environment",
"p_ci_builds"."when",
"p_ci_builds"."yaml_variables",
"p_ci_builds"."queued_at",
"p_ci_builds"."lock_version",
"p_ci_builds"."coverage_regex",
"p_ci_builds"."retried",
"p_ci_builds"."protected",
"p_ci_builds"."failure_reason",
"p_ci_builds"."scheduled_at",
"p_ci_builds"."token_encrypted",
"p_ci_builds"."resource_group_id",
"p_ci_builds"."waiting_for_resource_at",
"p_ci_builds"."processed",
"p_ci_builds"."scheduling_type",
"p_ci_builds"."id",
"p_ci_builds"."stage_id",
"p_ci_builds"."partition_id",
"p_ci_builds"."auto_canceled_by_partition_id",
"p_ci_builds"."auto_canceled_by_id",
"p_ci_builds"."commit_id",
"p_ci_builds"."erased_by_id",
"p_ci_builds"."project_id",
"p_ci_builds"."runner_id",
"p_ci_builds"."trigger_request_id",
"p_ci_builds"."upstream_pipeline_id",
"p_ci_builds"."user_id",
"p_ci_builds"."execution_config_id"
FROM
"p_ci_builds"
INNER JOIN "ci_pipelines" "pipeline" ON "pipeline"."partition_id" IS NOT NULL
AND "pipeline"."id" = "p_ci_builds"."commit_id"
AND "pipeline"."partition_id" = "p_ci_builds"."partition_id"
WHERE
"p_ci_builds"."type" = 'Ci::Build'
AND "pipeline"."source" = 15
AND ("p_ci_builds"."status" IN ('preparing', 'pending', 'running', 'waiting_for_callback', 'waiting_for_resource', 'canceling', 'created'))
AND "p_ci_builds"."created_at" > '2024-07-15 15:07:29.351683'
AND "p_ci_builds"."updated_at" > '2024-07-15 15:07:29.351826'
LIMIT 100
https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/29883/commands/92878
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
How to set up and validate locally
- Create a new group
- Create some projects using the script
user = User.first
namespace_id = Group.last.id
5.times do
project_params = {
namespace_id: namespace_id,
name: "Test-#{FFaker::Lorem.characters(15)}"
}
project = ::Projects::CreateService.new(user, project_params).execute
project.save!
project.repository.create_file(user, 'Gemfile.lock', '', branch_name: Gitlab::DefaultBranch.value,
message: 'Add Gemfile.lock file')
project.repository.create_file(user, 'test.rb', 'puts "hello world"', branch_name: Gitlab::DefaultBranch.value,
message: 'Add test.rb file')
5.times do
branch_name = "branch-#{FFaker::Lorem.characters(15)}"
::Branches::CreateService.new(project, user).execute(branch_name, project.default_branch)
end
end
- Go to the Group page
- Go to Secure > Policies
- Click in new policy
- Select Scan Execution Policy
- Change to the .yaml mode
- Copy the policy content below
type: scan_execution_policy
name: policy
description: ''
enabled: true
policy_scope:
projects:
excluding: []
rules:
- type: schedule
cadence: 0 0 * * *
timezone: Etc/UTC
branch_type: all
actions:
- scan: secret_detection
- scan: sast
- scan: sast_iac
- scan: container_scanning
- scan: dependency_scanning
-
Merge the policy
-
Enable the feature flags
Feature.enable(:scan_execution_pipeline_worker)
Feature.enable(:scan_execution_pipeline_concurrency_control)
- Go to the Admin Area
- Go to settings > CI/CD > Continuous Integration and Deployment
- Update the
Security policy scheduled scans maximum concurrencyvalue to 50 - Trigger the scheduled scans
Get the schedule id in rails console
rule_schedule_id = Security::OrchestrationPolicyRuleSchedule.last.id
Update the schedule next run_at to a time in the past using the gdk psql
UPDATE security_orchestration_policy_rule_schedules SET next_run_at = '2024-05-28 00:15:00+00' WHERE id = <rule_schedule_id>;
trigger the schedule in the rails console
Security::OrchestrationPolicyRuleScheduleNamespaceWorker.new.perform(rule_schedule_id)
- Verify in the rails console that number of active
::Ci::Buildjobs:
You can use the pause strategy query to check the number of active ::Ci::Build jobs:
while true
puts ::Ci::Build.with_pipeline_source_type('security_orchestration_policy')
.with_status(*::Ci::HasStatus::ALIVE_STATUSES)
.created_after(1.hour.ago)
.updated_after(1.hour.ago).count
sleep 3
end
It might take some time, but you should be able to see the ci jobs count stop to increase after the limit is reached.