Skip to content

Evaluate allowed_plans early in all possible cases

Tomasz Maczukin requested to merge 412915-handle-allowed-plans-early into master

What does this MR do and why?

Add allowed_plans matcher to DropNotRunnableBuildsService

As described in #412915, allowed_plans being handled late (at job scheduling event) may cause problems with a bigger backlog of jobs not matching that configuration and yet targetting the affected runner.

With this change, dropping a job not matching any available runner will be done early, just like it's done with CI/CD minutes quota.

Database changes

For the execution within Ci::RegisterJobService

Queries diff before and after adding the change
--- before      2023-11-22 02:45:19.377196673 +0100
+++ after       2023-11-22 02:45:19.373196731 +0100
@@ -4 +4,2 @@
-QueryRecorder SQL:  --> INSERT INTO "ci_runners" ("created_at", "updated_at", "description", "platform", "runner_type", "token_encrypted", "allowed_plans") VALUES ('datetime', 'datetime', 'My runner2', 'darwin', 1, 'token', '{free}') RETURNING "id" /*application:test,correlation_id:___,db_config_name:ci,line:<internal:kernel>:90:in `tap'*/
+QueryRecorder SQL:  --> SELECT "plans"."id" FROM "plans" WHERE "plans"."name" = 'premium' /*application:test,correlation_id:___,db_config_name:main,line:/ee/app/models/ee/ci/runner.rb:48:in `ensure_allowed_plan_ids'*/
+QueryRecorder SQL:  --> INSERT INTO "ci_runners" ("created_at", "updated_at", "description", "platform", "runner_type", "token_encrypted", "allowed_plans", "allowed_plan_ids") VALUES ('datetime', 'datetime', 'My runner2', 'darwin', 1, 'token', '{premium}', '{7}') RETURNING "id" /*application:test,correlation_id:___,db_config_name:ci,line:<internal:kernel>:90:in `tap'*/
@@ -9,6 +10 @@
-QueryRecorder SQL:  --> WITH "project_builds" AS MATERIALIZED (SELECT "ci_running_builds"."project_id", COUNT(*) AS running_builds FROM "ci_running_builds" WHERE "ci_running_builds"."runner_type" = 1 GROUP BY "ci_running_builds"."project_id") SELECT "ci_pending_builds"."build_id" FROM "ci_pending_builds" LEFT JOIN project_builds ON ci_pending_builds.project_id = project_builds.project_id WHERE "ci_pending_builds"."instance_runners_enabled" = TRUE AND "ci_pending_builds"."minutes_exceeded" = FALSE AND (ci_pending_builds.tag_ids = '{}') ORDER BY COALESCE(project_builds.running_builds, 0) ASC, ci_pending_builds.build_id ASC /*application:test,correlation_id:___,db_config_name:ci,line:/app/services/ci/queue/pending_builds_strategy.rb:46:in `build_ids'*/
-QueryRecorder SQL:  --> SELECT "p_ci_builds"."status", "p_ci_builds"."finished_at", "p_ci_builds"."created_at", "p_ci_builds"."updated_at", "p_ci_builds"."started_at", "p_ci_builds"."runner_id", "p_ci_builds"."coverage", "p_ci_builds"."commit_id", "p_ci_builds"."name", "p_ci_builds"."options", "p_ci_builds"."allow_failure", "p_ci_builds"."stage", "p_ci_builds"."trigger_request_id", "p_ci_builds"."stage_idx", "p_ci_builds"."tag", "p_ci_builds"."ref", "p_ci_builds"."user_id", "p_ci_builds"."type", "p_ci_builds"."target_url", "p_ci_builds"."description", "p_ci_builds"."project_id", "p_ci_builds"."erased_by_id", "p_ci_builds"."erased_at", "p_ci_builds"."artifacts_expire_at", "p_ci_builds"."environment", "p_ci_builds"."when", "p_ci_builds"."yaml_variables", "p_ci_builds"."queued_at", "p_ci_builds"."lock_version", "p_ci_builds"."coverage_regex", "p_ci_builds"."auto_canceled_by_id", "p_ci_builds"."retried", "p_ci_builds"."protected", "p_ci_builds"."failure_reason", "p_ci_builds"."scheduled_at", "p_ci_builds"."token_encrypted", "p_ci_builds"."upstream_pipeline_id", "p_ci_builds"."resource_group_id", "p_ci_builds"."waiting_for_resource_at", "p_ci_builds"."processed", "p_ci_builds"."scheduling_type", "p_ci_builds"."id", "p_ci_builds"."stage_id", "p_ci_builds"."partition_id", "p_ci_builds"."auto_canceled_by_partition_id" FROM "p_ci_builds" WHERE "p_ci_builds"."type" = 'Ci::Build' AND "p_ci_builds"."id" = 4 LIMIT 1 /*application:test,correlation_id:___,db_config_name:ci,line:/app/services/ci/register_job_service.rb:136:in `block in each_build'*/
-QueryRecorder SQL:  --> SELECT "tags".* FROM "tags" INNER JOIN "taggings" ON "tags"."id" = "taggings"."tag_id" WHERE "taggings"."taggable_id" = 4 AND "taggings"."taggable_type" = 'CommitStatus' AND (taggings.context = 'tags' AND taggings.tagger_id IS NULL) /*application:test,correlation_id:___,db_config_name:ci,line:/app/models/ci/build.rb:669:in `tag_list'*/
-QueryRecorder SQL:  --> SELECT "projects"."id", "projects"."name", "projects"."path", "projects"."description", "projects"."created_at", "projects"."updated_at", "projects"."creator_id", "projects"."namespace_id", "projects"."last_activity_at", "projects"."import_url", "projects"."visibility_level", "projects"."archived", "projects"."avatar", "projects"."merge_requests_template", "projects"."star_count", "projects"."merge_requests_rebase_enabled", "projects"."import_type", "projects"."import_source", "projects"."approvals_before_merge", "projects"."reset_approvals_on_push", "projects"."merge_requests_ff_only_enabled", "projects"."issues_template", "projects"."mirror", "projects"."mirror_last_update_at", "projects"."mirror_last_successful_update_at", "projects"."mirror_user_id", "projects"."shared_runners_enabled", "projects"."runners_token", "projects"."build_allow_git_fetch", "projects"."build_timeout", "projects"."mirror_trigger_builds", "projects"."pending_delete", "projects"."public_builds", "projects"."last_repository_check_failed", "projects"."last_repository_check_at", "projects"."only_allow_merge_if_pipeline_succeeds", "projects"."has_external_issue_tracker", "projects"."repository_storage", "projects"."repository_read_only", "projects"."request_access_enabled", "projects"."has_external_wiki", "projects"."ci_config_path", "projects"."lfs_enabled", "projects"."description_html", "projects"."only_allow_merge_if_all_discussions_are_resolved", "projects"."repository_size_limit", "projects"."printing_merge_request_link_enabled", "projects"."auto_cancel_pending_pipelines", "projects"."service_desk_enabled", "projects"."cached_markdown_version", "projects"."delete_error", "projects"."last_repository_updated_at", "projects"."disable_overriding_approvers_per_merge_request", "projects"."storage_version", "projects"."resolve_outdated_diff_discussions", "projects"."remote_mirror_available_overridden", "projects"."only_mirror_protected_branches", "projects"."pull_mirror_available_overridden", "projects"."jobs_cache_index", "projects"."external_authorization_classification_label", "projects"."mirror_overwrites_diverged_branches", "projects"."pages_https_only", "projects"."external_webhook_token", "projects"."packages_enabled", "projects"."merge_requests_author_approval", "projects"."pool_repository_id", "projects"."runners_token_encrypted", "projects"."bfg_object_map", "projects"."detected_repository_languages", "projects"."merge_requests_disable_committers_approval", "projects"."require_password_to_approve", "projects"."max_pages_size", "projects"."max_artifacts_size", "projects"."pull_mirror_branch_prefix", "projects"."remove_source_branch_after_merge", "projects"."marked_for_deletion_at", "projects"."marked_for_deletion_by_user_id", "projects"."autoclose_referenced_issues", "projects"."suggestion_commit_message", "projects"."project_namespace_id", "projects"."hidden", "projects"."organization_id" FROM "projects" WHERE "projects"."id" = 4 LIMIT 1 /*application:test,correlation_id:___,db_config_name:main,line:/app/models/ci/build.rb:372:in `block in build_matcher'*/
-QueryRecorder SQL:  --> SELECT "namespaces"."id", "namespaces"."name", "namespaces"."path", "namespaces"."owner_id", "namespaces"."created_at", "namespaces"."updated_at", "namespaces"."type", "namespaces"."description", "namespaces"."avatar", "namespaces"."membership_lock", "namespaces"."share_with_group_lock", "namespaces"."visibility_level", "namespaces"."request_access_enabled", "namespaces"."ldap_sync_status", "namespaces"."ldap_sync_error", "namespaces"."ldap_sync_last_update_at", "namespaces"."ldap_sync_last_successful_update_at", "namespaces"."ldap_sync_last_sync_at", "namespaces"."description_html", "namespaces"."lfs_enabled", "namespaces"."parent_id", "namespaces"."shared_runners_minutes_limit", "namespaces"."repository_size_limit", "namespaces"."require_two_factor_authentication", "namespaces"."two_factor_grace_period", "namespaces"."cached_markdown_version", "namespaces"."project_creation_level", "namespaces"."runners_token", "namespaces"."file_template_project_id", "namespaces"."saml_discovery_token", "namespaces"."runners_token_encrypted", "namespaces"."custom_project_templates_group_id", "namespaces"."auto_devops_enabled", "namespaces"."extra_shared_runners_minutes_limit", "namespaces"."last_ci_minutes_notification_at", "namespaces"."last_ci_minutes_usage_notification_level", "namespaces"."subgroup_creation_level", "namespaces"."emails_disabled", "namespaces"."max_pages_size", "namespaces"."max_artifacts_size", "namespaces"."mentions_disabled", "namespaces"."default_branch_protection", "namespaces"."max_personal_access_token_lifetime", "namespaces"."push_rule_id", "namespaces"."shared_runners_enabled", "namespaces"."allow_descendants_override_disabled_shared_runners", "namespaces"."traversal_ids", "namespaces"."organization_id" FROM "namespaces" INNER JOIN "projects" ON "namespaces"."id" = "projects"."namespace_id" WHERE "projects"."id" = 4 LIMIT 1 /*application:test,correlation_id:___,db_config_name:main,line:/ee/app/models/ee/ci/runner.rb:53:in `allowed_for_plans?'*/
-QueryRecorder SQL:  --> SELECT DISTINCT "plans".* FROM "plans" INNER JOIN "gitlab_subscriptions" ON "gitlab_subscriptions"."hosted_plan_id" = "plans"."id" WHERE "plans"."name" IN ('bronze', 'silver', 'premium', 'gold', 'ultimate', 'ultimate_trial', 'ultimate_trial_paid_customer', 'premium_trial', 'opensource') AND "gitlab_subscriptions"."namespace_id" = 7 /* allow_cross_joins_across_databases */ /*application:test,correlation_id:___,db_config_name:main,line:/ee/app/models/ee/ci/runner.rb:55:in `map'*/
+QueryRecorder SQL:  --> WITH "project_builds" AS MATERIALIZED (SELECT "ci_running_builds"."project_id", COUNT(*) AS running_builds FROM "ci_running_builds" WHERE "ci_running_builds"."runner_type" = 1 GROUP BY "ci_running_builds"."project_id") SELECT "ci_pending_builds"."build_id" FROM "ci_pending_builds" LEFT JOIN project_builds ON ci_pending_builds.project_id = project_builds.project_id WHERE "ci_pending_builds"."instance_runners_enabled" = TRUE AND "ci_pending_builds"."minutes_exceeded" = FALSE AND "ci_pending_builds"."plan_id" = 7 AND (ci_pending_builds.tag_ids = '{}') ORDER BY COALESCE(project_builds.running_builds, 0) ASC, ci_pending_builds.build_id ASC /*application:test,correlation_id:___,db_config_name:ci,line:/app/services/ci/queue/pending_builds_strategy.rb:46:in `build_ids'*/

The interesting thing is that the added change adds one query (the SELECT "plans"."id" FROM "plans" WHERE "plans"."name" = 'premium' query added in the "after" case), but removes 5 queries from the end of the "before" list!

New queries within the service execution:

  1. Plan selection

    SELECT
        "plans"."id"
    FROM
        "plans"
    WHERE
        "plans"."name" = 'premium'

    Explain execution

Changed queries within the service execution:

  1. Runner creation

    INSERT INTO "ci_runners" ("created_at", "updated_at", "description", "platform", "runner_type", "token_encrypted", "allowed_plans", "allowed_plan_ids")
        VALUES ('datetime', 'datetime', 'My runner2', 'darwin', 1, 'token', '{premium}', '{7}')
    RETURNING
        "id"

    The list of columns to be filled with values was updated with the new allowed_plan_ids column. BTW, this query itself is not executed within the service, normally. Queries were gathered through rspec tests execution, and here it caught the creation of ci_runners entity when preparing the test. That update will be however included in the API that is creating runner entity, as filling allowed_plan_ids basing on the current value of allowed_plans is now done automatically in the before_commit hook.

    Explain execution

  2. Selection a list of jobs for picking for the runner

    WITH "project_builds" AS MATERIALIZED (
        SELECT
            "ci_running_builds"."project_id",
            COUNT(
                *
    ) AS running_builds
        FROM
            "ci_running_builds"
        WHERE
            "ci_running_builds"."runner_type" = 1
        GROUP BY
            "ci_running_builds"."project_id"
    )
    SELECT
        "ci_pending_builds"."build_id"
    FROM
        "ci_pending_builds"
        LEFT JOIN project_builds ON ci_pending_builds.project_id = project_builds.project_id
    WHERE
        "ci_pending_builds"."instance_runners_enabled" = TRUE
        AND "ci_pending_builds"."minutes_exceeded" = FALSE
        AND "ci_pending_builds"."plan_id" = 7
        AND (ci_pending_builds.tag_ids = '{}')
    ORDER BY
        COALESCE(project_builds.running_builds, 0) ASC,
        ci_pending_builds.build_id ASC

    The list of query parameters was updated with addition of the AND "ci_pending_builds"."plan_id" = 7.

    This is probably the most significant query that was affected by this MR.

    Explain execution

For the execution within Ci::PipelineCreation::DropNotRunnableBuildsService

Changed queries within the service execution:

  1. a

    SELECT
        array_agg(ci_runners.id),
        "ci_runners"."runner_type",
        "ci_runners"."public_projects_minutes_cost_factor",
        "ci_runners"."private_projects_minutes_cost_factor",
        "ci_runners"."run_untagged",
        "ci_runners"."access_level",
        (
            SELECT
                COALESCE(array_agg(tags.name ORDER BY name), ARRAY[]::text[])
            FROM
                "taggings"
                INNER JOIN "tags" ON "tags"."id" = "taggings"."tag_id"
            WHERE (taggings.taggable_id = "ci_runners".id)
            AND "taggings"."context" = 'tags'
            AND "taggings"."taggable_type" = 'Ci::Runner'),
        "ci_runners"."allowed_plans",
        "ci_runners"."allowed_plan_ids"
    FROM ((
            SELECT
                "ci_runners".*
            FROM
                "ci_runners"
                INNER JOIN "ci_runner_projects" ON "ci_runners"."id" = "ci_runner_projects"."runner_id"
            WHERE
                "ci_runner_projects"."project_id" = 0)
        UNION (
            SELECT
                "ci_runners".*
            FROM
                "ci_runners"
                INNER JOIN "ci_runner_namespaces" ON "ci_runner_namespaces"."runner_id" = "ci_runners"."id"
            WHERE
                "ci_runner_namespaces"."namespace_id" = 0)
        UNION (
            SELECT
                "ci_runners".*
            FROM
                "ci_runners"
            WHERE
                "ci_runners"."runner_type" = 1)) ci_runners
    WHERE
        "ci_runners"."active" = TRUE
        AND "ci_runners"."contacted_at" > '2023-11-22 00:00'
    GROUP BY
        "ci_runners"."runner_type",
        "ci_runners"."public_projects_minutes_cost_factor",
        "ci_runners"."private_projects_minutes_cost_factor",
        "ci_runners"."run_untagged",
        "ci_runners"."access_level",
        (
            SELECT
                COALESCE(array_agg(tags.name ORDER BY name), ARRAY[]::text[])
            FROM
                "taggings"
                INNER JOIN "tags" ON "tags"."id" = "taggings"."tag_id"
            WHERE (taggings.taggable_id = "ci_runners".id)
            AND "taggings"."context" = 'tags'
            AND "taggings"."taggable_type" = 'Ci::Runner'),
        "ci_runners"."allowed_plans",
        "ci_runners"."allowed_plan_ids"

    This adds "ci_runners"."allowed_plans" and "ci_runners"."allowed_plan_ids" to the list of fields that are selected and used for GROUP BY section.

    Explain execution

For the execution within EE::Ci::RetryJobService

To be added

For the execution within EE::Ci::RetryPipelineService

To be added

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #412915

Edited by Tomasz Maczukin

Merge request reports