Preload resource group assignment dependencies

What does this MR do and why?

Preload deployable associations used while assigning resource group processables. This avoids extra read queries per processable while the worker is processing jobs after reassignment to urgent.

The state-machine updates remain per build, but the surrounding reads should not grow with the number of free resources.

This is an optimization to help get the worker back under SLO following reassignment to urgent in !239668 (merged)

Because this worker is on the CI hot path, the optimization is gated by the short-lived resource_group_assignment_preloads de-risking feature flag. This allows operators to disable the fixed preload cost without reverting the MR.

The implementation deliberately pays a fixed preload cost for the limited set of upcoming processables, including environment.last_deployment, to avoid per-processable reads from has_outdated_deployment?. The number of upcoming processables is bounded by the number of free resources.

Raw SQL for preload queries

The examples below are from the enabled feature-flag path with two extra free resources and two waiting deploy jobs. IDs are representative from resource group has its default resource plus two additional free resources. In production, this limit is resource_group.resources.free.count.

Upcoming processables query:

SELECT "p_ci_builds".*
FROM "p_ci_builds"
WHERE "p_ci_builds"."type" IN ('Ci::Processable', 'Ci::Bridge', 'Ci::Build')
  AND "p_ci_builds"."resource_group_id" = 137
  AND "p_ci_builds"."status" = 'waiting_for_resource'
LIMIT 3;

User preload:

SELECT "users".*
FROM "users"
WHERE "users"."id" = 161;

Job environment preload:

SELECT "job_environments".*
FROM "job_environments"
WHERE "job_environments"."ci_job_id" IN (203, 204);

Job definition preload:

SELECT
  "p_ci_job_definition_instances"."job_id" AS t0_r0,
  "p_ci_job_definition_instances"."job_definition_id" AS t0_r1,
  "p_ci_job_definition_instances"."partition_id" AS t0_r2,
  "p_ci_job_definition_instances"."project_id" AS t0_r3,
  "p_ci_job_definitions"."id" AS t1_r0,
  "p_ci_job_definitions"."partition_id" AS t1_r1,
  "p_ci_job_definitions"."project_id" AS t1_r2,
  "p_ci_job_definitions"."created_at" AS t1_r3,
  "p_ci_job_definitions"."interruptible" AS t1_r4,
  "p_ci_job_definitions"."checksum" AS t1_r5,
  "p_ci_job_definitions"."config" AS t1_r6
FROM "p_ci_job_definition_instances"
LEFT OUTER JOIN "p_ci_job_definitions"
  ON "p_ci_job_definitions"."partition_id" IS NOT NULL
  AND "p_ci_job_definitions"."id" = "p_ci_job_definition_instances"."job_definition_id"
  AND "p_ci_job_definitions"."partition_id" = "p_ci_job_definition_instances"."partition_id"
WHERE "p_ci_job_definition_instances"."partition_id" = 100
  AND "p_ci_job_definitions"."partition_id" = 100
  AND "p_ci_job_definition_instances"."job_id" IN (203, 204);

Deployment preload:

SELECT "deployments".*
FROM "deployments"
WHERE "deployments"."deployable_type" = 'CommitStatus'
  AND "deployments"."deployable_id" IN (203, 204);

Project preload:

SELECT "projects".*
FROM "projects"
WHERE "projects"."id" = 144;

Environment preload:

SELECT "environments".*
FROM "environments"
WHERE "environments"."id" = 57;

Project CI/CD settings preload:

SELECT "project_ci_cd_settings".*
FROM "project_ci_cd_settings"
WHERE "project_ci_cd_settings"."project_id" = 144;

Environment last deployment preload:

SELECT "deployments".*
FROM (
  (
    SELECT "deployments".*
    FROM "deployments"
    WHERE "deployments"."environment_id" = 57
      AND "deployments"."status" = 2
    ORDER BY "deployments"."finished_at" DESC
    LIMIT 1
  )
) deployments;

For multiple environments, Preloaders::Environments::DeploymentPreloader builds this last-deployment query as a UNION of one limited query per environment.

References

Related to !239668 (merged) Rollout issue: #603640

Screenshots or screen recordings

Not applicable. Backend performance optimization only.

How to set up and validate locally

  1. Prove the fixed path does not grow query count with more deployable jobs:

    bundle exec rspec spec/services/ci/resource_groups/assign_resource_from_resource_group_service_spec.rb:219
  2. To manually prove the old N+1, temporarily disable the flag in that example and rerun the same spec. For example, add this inside the with multiple deployable builds context, do not commit it, and rerun the command above:

    before do
      stub_feature_flags(resource_group_assignment_preloads: false)
    end

    The query-recorder assertion should fail and show repeated reads for associations used by has_outdated_deployment?, including deployments, environments, job definitions, and project CI/CD settings.

  3. Prove the disabled flag keeps the fallback path available:

    bundle exec rspec spec/services/ci/resource_groups/assign_resource_from_resource_group_service_spec.rb:273

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Hordur Freyr Yngvason

Merge request reports

Loading