Skip to content

Use 'executing' build status to determine busy runners

What does this MR do and why?

This MR renames the Running badge in the runners list to Active. This means that the badge will appear not only when a runner has builds in the :running state, but also in :canceling state.

With #470034, we'll remove the RUNNING terminology.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After
image image

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Through UI (more time-consuming)

  1. Ensure you have a runner registered
  2. Ensure you have a project with a CI job (ideally a long-running job, e.g. with a sleep 30 statement in the script)
  3. Run a job with on that runner
  4. Visit http://gdk.test:3000/admin/runners and locate the runner. It should have the Active badge if the job is running, canceling, etc.

With GraphQL (easier)

  1. Go to the shell in your GDK gitlab directory and run bundle exec rake "gitlab:seed:runner_fleet". This will seed your GDK with some runners and jobs required for testing this MR.

  2. Find out what is the latest active runner (but not in running status):

    SELECT ci_builds.runner_id, ci_builds.status
    FROM ci_builds
    WHERE ci_builds.runner_id IN (
        SELECT MAX(runner_id) AS runner_id
        FROM ci_builds
        WHERE status = 'canceling'
          AND type = 'Ci::Build')
    GROUP BY runner_id, status
  3. In http://gdk.test:3000/-/graphql-explorer, run the following query:

    {
      runner(id: "gid://gitlab/Ci::Runner/9938") { # Replace with runner_id found in step 2
        id
        jobExecutionStatus
        managers {
          nodes {
            systemId
            jobExecutionStatus
          }
        }
      }
    }

The runner.jobExecutionStatus should be shown as ACTIVE, even though none of the builds associated with the runner is in the :running state.

Database query plans

↳ app/graphql/types/ci/runner_type.rb:158:in `block in job_execution_status'

The only thing that changed here is that we're filtering for more statuses.

Old query:

SELECT "ci_runners".*
FROM "ci_runners"
WHERE "ci_runners"."id" IN (... <list of 20 runner ids>)
  AND (EXISTS (
      SELECT 1
      FROM "p_ci_builds"
      WHERE "p_ci_builds"."type" = 'Ci::Build'
        AND ("p_ci_builds"."status" IN ('running'))
        AND ("p_ci_builds".runner_id = "ci_runners".id)))

New query:

https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/29587/commands/91858

SELECT "ci_runners".*
FROM "ci_runners"
WHERE "ci_runners"."id" IN (1506020, 1506021, 11573930, 11573990, 11574038, 11574045, 11574068, 11574076, 11574084, 11574096, 11728715, 11728725, 11728729, 11728733, 11728737, 11728740, 11728747, 11728750, 12270807, 12270831)
  AND (EXISTS (
      SELECT 1
      FROM "p_ci_builds"
      WHERE "p_ci_builds"."type" = 'Ci::Build'
        AND ("p_ci_builds"."status" IN ('preparing', 'pending', 'running', 'waiting_for_callback', 'waiting_for_resource',
	  'canceling', 'created'))
        AND ("p_ci_builds".runner_id = "ci_runners".id)))
 Nested Loop Semi Join  (cost=1.14..2088.76 rows=1 width=225) (actual time=166.772..336.202 rows=8 loops=1)
   Buffers: shared hit=1906 read=225 dirtied=90
   I/O Timings: read=325.317 write=0.000
   ->  Index Scan using ci_runners_pkey on public.ci_runners  (cost=0.43..68.71 rows=20 width=225) (actual time=5.538..46.429 rows=20 loops=1)
         Index Cond: (ci_runners.id = ANY ('{1506020,1506021,11573930,11573990,11574038,11574045,11574068,11574076,11574084,11574096,11728715,11728725,11728729,11728733,11728737,11728740,11728747,11728750,12270807,12270831}'::integer[]))
         Buffers: shared hit=57 read=27 dirtied=4
         I/O Timings: read=45.520 write=0.000
   ->  Append  (cost=0.71..90.42 rows=1058 width=8) (actual time=14.480..14.480 rows=0 loops=20)
         Buffers: shared hit=1849 read=198 dirtied=86
         I/O Timings: read=279.797 write=0.000
         ->  Index Only Scan using index_ci_builds_on_status_and_type_and_runner_id on public.ci_builds p_ci_builds_1  (cost=0.71..40.19 rows=746 width=8) (actual time=1.818..1.818 rows=0 loops=20)
               Index Cond: ((p_ci_builds_1.status = ANY ('{preparing,pending,running,waiting_for_callback,waiting_for_resource,canceling,created}'::text[])) AND (p_ci_builds_1.type = 'Ci::Build'::text) AND (p_ci_builds_1.runner_id = ci_runners.id))
               Heap Fetches: 0
               Buffers: shared hit=685 read=21
               I/O Timings: read=34.264 write=0.000
         ->  Index Only Scan using ci_builds_101_status_type_runner_id_idx on gitlab_partitions_dynamic.ci_builds_101 p_ci_builds_2  (cost=0.70..29.54 rows=282 width=8) (actual time=2.770..2.770 rows=0 loops=20)
               Index Cond: ((p_ci_builds_2.status = ANY ('{preparing,pending,running,waiting_for_callback,waiting_for_resource,canceling,created}'::text[])) AND (p_ci_builds_2.type = 'Ci::Build'::text) AND (p_ci_builds_2.runner_id = ci_runners.id))
               Heap Fetches: 0
               Buffers: shared hit=672 read=28
               I/O Timings: read=53.580 write=0.000
         ->  Index Only Scan using ci_builds_102_status_type_runner_id_idx on gitlab_partitions_dynamic.ci_builds_102 p_ci_builds_3  (cost=0.57..15.40 rows=30 width=8) (actual time=9.856..9.856 rows=0 loops=20)
               Index Cond: ((p_ci_builds_3.status = ANY ('{preparing,pending,running,waiting_for_callback,waiting_for_resource,canceling,created}'::text[])) AND (p_ci_builds_3.type = 'Ci::Build'::text) AND (p_ci_builds_3.runner_id = ci_runners.id))
               Heap Fetches: 112
               Buffers: shared hit=492 read=149 dirtied=84
               I/O Timings: read=191.953 write=0.000
↳ app/graphql/types/ci/runner_manager_type.rb:55:in `block in job_execution_status'

Old query:

SELECT "ci_runner_machines".*
FROM "ci_runner_machines"
WHERE "ci_runner_machines"."id" = 466
  AND (EXISTS (
      SELECT 1
      FROM "p_ci_builds"
        INNER JOIN "p_ci_runner_machine_builds" ON "p_ci_runner_machine_builds"."partition_id" IS NOT NULL
          AND "p_ci_runner_machine_builds"."build_id" = "p_ci_builds"."id"
          AND "p_ci_runner_machine_builds"."partition_id" = "p_ci_builds"."partition_id"
      WHERE "p_ci_builds"."type" = 'Ci::Build'
        AND ("p_ci_builds"."status" IN ('running'))
        AND ("p_ci_builds".runner_id = "ci_runner_machines".runner_id)
        AND ("p_ci_runner_machine_builds".runner_machine_id = "ci_runner_machines".id)
      LIMIT 1))

New query:

https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/29587/commands/91859

SELECT "ci_runner_machines".*
FROM "ci_runner_machines"
WHERE "ci_runner_machines"."id" = 12764080
  AND (EXISTS (
      SELECT 1
      FROM "p_ci_builds"
        INNER JOIN "p_ci_runner_machine_builds" ON "p_ci_runner_machine_builds"."partition_id" IS NOT NULL
          AND "p_ci_runner_machine_builds"."build_id" = "p_ci_builds"."id"
          AND "p_ci_runner_machine_builds"."partition_id" = "p_ci_builds"."partition_id"
      WHERE "p_ci_builds"."type" = 'Ci::Build'
	AND ("p_ci_builds"."status" IN ('preparing', 'pending', 'running', 'waiting_for_callback', 'waiting_for_resource',
	  'canceling', 'created'))
        AND ("p_ci_builds".runner_id = "ci_runner_machines".runner_id)
        AND ("p_ci_runner_machine_builds".runner_machine_id = "ci_runner_machines".id)))
 Nested Loop Semi Join  (cost=1538.61..10679.89 rows=1 width=103) (actual time=25.817..25.821 rows=0 loops=1)
   Buffers: shared hit=98 read=15
   I/O Timings: read=25.352 write=0.000
   ->  Index Scan using ci_runner_machines_pkey on public.ci_runner_machines  (cost=0.42..3.44 rows=1 width=103) (actual time=5.118..5.120 rows=1 loops=1)
         Index Cond: (ci_runner_machines.id = 12764080)
         Buffers: shared hit=3 read=4
         I/O Timings: read=5.071 write=0.000
   ->  Hash Join  (cost=1538.18..10676.43 rows=1 width=16) (actual time=20.692..20.695 rows=0 loops=1)
         Hash Cond: ((p_ci_runner_machine_builds.build_id = p_ci_builds.id) AND (p_ci_runner_machine_builds.partition_id = p_ci_builds.partition_id))
         Buffers: shared hit=95 read=11
         I/O Timings: read=20.281 write=0.000
         ->  Append  (cost=0.57..9091.68 rows=6284 width=24) (actual time=12.740..12.740 rows=1 loops=1)
               Buffers: shared read=5
               I/O Timings: read=12.708 write=0.000
               ->  Index Scan using ci_runner_machine_builds_100_runner_machine_id_idx1 on gitlab_partitions_dynamic.ci_runner_machine_builds_100 p_ci_runner_machine_builds_1  (cost=0.57..3630.04 rows=2460 width=24) (actual time=12.739..12.739 rows=1 loops=1)
                     Index Cond: (p_ci_runner_machine_builds_1.runner_machine_id = 12764080)
                     Filter: (p_ci_runner_machine_builds_1.partition_id IS NOT NULL)
                     Rows Removed by Filter: 0
                     Buffers: shared read=5
                     I/O Timings: read=12.708 write=0.000
               ->  Index Scan using ci_runner_machine_builds_101_runner_machine_id_idx1 on gitlab_partitions_dynamic.ci_runner_machine_builds_101 p_ci_runner_machine_builds_2  (cost=0.57..4616.58 rows=3275 width=24) (actual time=0.000..0.000 rows=0 loops=0)
                     Index Cond: (p_ci_runner_machine_builds_2.runner_machine_id = 12764080)
                     Filter: (p_ci_runner_machine_builds_2.partition_id IS NOT NULL)
                     Rows Removed by Filter: 0
                     I/O Timings: read=0.000 write=0.000
               ->  Index Scan using ci_runner_machine_builds_102_runner_machine_id_idx1 on gitlab_partitions_dynamic.ci_runner_machine_builds_102 p_ci_runner_machine_builds_3  (cost=0.44..813.64 rows=549 width=24) (actual time=0.000..0.000 rows=0 loops=0)
                     Index Cond: (p_ci_runner_machine_builds_3.runner_machine_id = 12764080)
                     Filter: (p_ci_runner_machine_builds_3.partition_id IS NOT NULL)
                     Rows Removed by Filter: 0
                     I/O Timings: read=0.000 write=0.000
         ->  Hash  (cost=1521.74..1521.74 rows=1058 width=24) (actual time=7.921..7.923 rows=0 loops=1)
               Buckets: 2048  Batches: 1  Memory Usage: 16kB
               Buffers: shared hit=95 read=6
               I/O Timings: read=7.573 write=0.000
               ->  Append  (cost=0.71..1521.74 rows=1058 width=24) (actual time=7.920..7.921 rows=0 loops=1)
                     Buffers: shared hit=95 read=6
                     I/O Timings: read=7.573 write=0.000
                     ->  Index Scan using index_ci_builds_on_status_and_type_and_runner_id on public.ci_builds p_ci_builds_1  (cost=0.71..1062.76 rows=746 width=24) (actual time=1.660..1.660 rows=0 loops=1)
                           Index Cond: (((p_ci_builds_1.status)::text = ANY ('{preparing,pending,running,waiting_for_callback,waiting_for_resource,canceling,created}'::text[])) AND ((p_ci_builds_1.type)::text = 'Ci::Build'::text) AND (p_ci_builds_1.runner_id = ci_runner_machines.runner_id))
                           Buffers: shared hit=37 read=1
                           I/O Timings: read=1.532 write=0.000
                     ->  Index Scan using ci_builds_101_status_type_runner_id_idx on gitlab_partitions_dynamic.ci_builds_101 p_ci_builds_2  (cost=0.70..396.97 rows=282 width=24) (actual time=3.418..3.419 rows=0 loops=1)
                           Index Cond: (((p_ci_builds_2.status)::text = ANY ('{preparing,pending,running,waiting_for_callback,waiting_for_resource,canceling,created}'::text[])) AND ((p_ci_builds_2.type)::text = 'Ci::Build'::text) AND (p_ci_builds_2.runner_id = ci_runner_machines.runner_id))
                           Buffers: shared hit=33 read=2
                           I/O Timings: read=3.303 write=0.000
                     ->  Index Scan using ci_builds_102_status_type_runner_id_idx on gitlab_partitions_dynamic.ci_builds_102 p_ci_builds_3  (cost=0.57..56.72 rows=30 width=24) (actual time=2.836..2.837 rows=0 loops=1)
                           Index Cond: (((p_ci_builds_3.status)::text = ANY ('{preparing,pending,running,waiting_for_callback,waiting_for_resource,canceling,created}'::text[])) AND ((p_ci_builds_3.type)::text = 'Ci::Build'::text) AND (p_ci_builds_3.runner_id = ci_runner_machines.runner_id))
                           Buffers: shared hit=25 read=3
                           I/O Timings: read=2.738 write=0.000

Merge request reports