Skip to content

CI/CD → Runners page showing stale runner version during job run after a recent runner version upgrade

Summary

When viewing the CI/CD → Runners page (/admin/runners) after performing a GitLab Runner version upgrade for registered runners, the Version value will intermittently change between the actual running version of GitLab Runner, and the previous version of GitLab Runner that was last in use. The previously used GitLab Runner version will populate this value as soon as the runner starts to process a job, and the once the runner is idle again and requesting jobs to run, the version value will correctly show the actual running version again.

Steps to reproduce

Example:

gitlab-runner_15-7-3_to_15-8-0_on_gitlab_15-8-0_example

Example scenario:

  1. Register a new runner to a GitLab instance. In this example, GitLab Runner 15.7.3 has been installed via apt, and freshly registered to a GitLab instance running 15.8.0
  2. If you visit the /admin/runners page, you'll see the registered GitLab Runner showing correctly as Version 15.7.3.
  3. Upgrade GitLab Runner to a different version. In this example I've upgraded from 15.7.3 to 15.8.0.
  4. After the upgrade completes, you can see the correct version of the runner reflected within the /admin/runners page. It appears as 15.8.0 per the upgrade that was just performed.
  5. If you now retry an old job (or run a new pipeline and job entirely, the result will be the same) - Once the job is picked up by the runner, the Version value on the /admin/runners page will suddenly show the previous version of the runner that was in use, 15.7.3 in this example.
  6. Once the job completes and no other jobs are being processed by the runner, the runner's Version value on the /admin/runners page will reflect the correct version once again - 15.8.0 in this example.

What is the current bug behavior?

  • The currently used GitLab Runner version is not always correctly reflected in the /admin/runners page, at least not shortly after having performed GitLab Runner version upgrades. This can cause confusion for users as it may seem as though something has gone wrong during the GitLab Runner upgrade process, however this isn't the case.

What is the expected correct behavior?

  • The currently used GitLab Runner version should consistently be reflected correctly after a GitLab Runner version upgrade is performed. The version displayed within the /admin/runners page or any other UI pages describing runner details, should consistently display the actual GitLab Runner version in use, irrespective of if a job is being run or not, and ideally consistently after a GitLab Runner version change is detected.

Additional troubleshooting details

Please note that a new/separate test environment was used in the output below, so the IP addressing and other details may differ slightly from what is shown in the example video, but the same methodology was followed to reproduce the problem.

  1. It was confirmed that that the runner token/authentication has not been duplicated across more than one runner. Running gitlab-runner reset-token --all-runners was tested just to double check, and the problem is still present after cycling the token.

  2. To try and rule out the possibility of the runner itself sending stale version data, tcpdump was used in the test environment on the instance running GitLab Runner, to check the version values being sent in the JSON payloads over HTTP traffic destined for the GitLab instance. Here the only version value we can see being sent from the runner is correct running version - 15.8.0:

    root@ip-172-31-19-34:~# tcpdump -A -s 0 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' | grep -w "version"
    
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
    {"info":{"name":"gitlab-runner","version":"15.8.0","revision":"12335144","platform":"linux","architecture":"amd64","executor":"docker","shell":"bash","features":{"variables":true,"image":true,"services":true,"artifacts":true,"cache":true,"shared":false,"upload_multiple_artifacts":true,"upload_raw_artifacts":true,"session":true,"terminal":true,"refspecs":true,"masking":true,"proxy":false,"raw_variables":true,"artifacts_exclude":true,"multi_build_steps":true,"trace_reset":true,"trace_checksum":true,"trace_size":true,"vault_secrets":true,"cancelable":true,"return_exit_code":true,"service_variables":true,"service_multiple_aliases":true},"config":{"gpus":""}},"token":"<REDACTED>","system_id":"s_37bf828021dd"}
    
  3. If we try to verify the runner version via the Rails console live while replicating the scenario as previously described, the version discrepancy is visible in the output, and you can see the version output switch between 15.8.0 and 15.7.3 when a job is run, and back to 15.8.0 once the runner is idle again.

    irb(main):013:0> Rails.logger.level = Logger::DEBUG
    => 0
    irb(main):014:0> testmctest = true
    => true
    irb(main):015:1* loop do
    irb(main):016:1*   sleep(1)
    irb(main):017:1*   pp Ci::Runner.find_by_token('<REDACTED>').version
    irb(main):018:1*   break if !testmctest
    irb(main):019:0> end
    
      Ci::Runner Load (1.4ms)  /*application:console,db_config_name:main,console_hostname:ip-172-31-19-34,console_username:ubuntu*/ SELECT "ci_runners".* FROM "ci_runners" WHERE (token_expires_at IS NULL OR token_expires_at >= NOW()) AND "ci_runners"."token_encrypted" IN ('<REDACTED>', '<REDACTED>') LIMIT 1
    "15.8.0"
    
      Ci::Runner Load (0.7ms)  /*application:console,db_config_name:main,console_hostname:ip-172-31-19-34,console_username:ubuntu*/ SELECT "ci_runners".* FROM "ci_runners" WHERE (token_expires_at IS NULL OR token_expires_at >= NOW()) AND "ci_runners"."token_encrypted" IN ('<REDACTED>', '<REDACTED>') LIMIT 1
    "15.8.0"
    
      Ci::Runner Load (0.6ms)  /*application:console,db_config_name:main,console_hostname:ip-172-31-19-34,console_username:ubuntu*/ SELECT "ci_runners".* FROM "ci_runners" WHERE (token_expires_at IS NULL OR token_expires_at >= NOW()) AND "ci_runners"."token_encrypted" IN ('<REDACTED>', '<REDACTED>') LIMIT 1
    "15.7.3"
    
      Ci::Runner Load (0.8ms)  /*application:console,db_config_name:main,console_hostname:ip-172-31-19-34,console_username:ubuntu*/ SELECT "ci_runners".* FROM "ci_runners" WHERE (token_expires_at IS NULL OR token_expires_at >= NOW()) AND "ci_runners"."token_encrypted" IN ('<REDACTED>', '<REDACTED>') LIMIT 1
    "15.7.3"
    
      Ci::Runner Load (0.6ms)  /*application:console,db_config_name:main,console_hostname:ip-172-31-19-34,console_username:ubuntu*/ SELECT "ci_runners".* FROM "ci_runners" WHERE (token_expires_at IS NULL OR token_expires_at >= NOW()) AND "ci_runners"."token_encrypted" IN ('<REDACTED>', '<REDACTED>') LIMIT 1
    "15.8.0"
  4. When looking into the ci_runners table further, we can still see the old runner version 15.7.3 being stored:

    irb(main):023:0> pp Ci::Runner.all
    
      Ci::Runner Load (0.9ms)  /*application:console,db_config_name:main,console_hostname:ip-172-31-19-34,console_username:ubuntu*/ SELECT "ci_runners".* FROM "ci_runners"
    [#<Ci::Runner:0x00007f6297586b90
    
      id: 2,
      token: nil,
      created_at: Wed, 25 Jan 2023 07:25:06.362582000 UTC +00:00,
      updated_at: Wed, 25 Jan 2023 07:25:06.362582000 UTC +00:00,
      description: "[FILTERED]",
      contacted_at: Wed, 25 Jan 2023 07:25:15.067399000 UTC +00:00,
      active: true,
      name: "gitlab-runner",
      version: "15.7.3",
      revision: "914aa415",
      platform: "linux",
      architecture: "amd64",
      run_untagged: true,
      locked: true,
      access_level: "not_protected",
      ip_address: "172.31.19.34",
      maximum_timeout: nil,
      runner_type: "instance_type",
      token_encrypted: "<REDACTED>",
      public_projects_minutes_cost_factor: 0.0,
      private_projects_minutes_cost_factor: 1.0,
      config: {},
      executor_type: "docker",
      maintainer_note: nil,
      token_expires_at: nil,
      allowed_plans: [],
      registration_type: 0,
      creator_id: nil,
      tag_list: nil>]
    gitlabhq_production=> SELECT * FROM ci_runners;
    -[ RECORD 1 ]------------------------+-------------------------------------------------
    id                                   | 2
    token                                |
    created_at                           | 2023-01-25 07:25:06.362582
    updated_at                           | 2023-01-25 07:25:06.362582
    description                          | ip-172-31-19-34
    contacted_at                         | 2023-01-25 07:25:15.067399
    active                               | t
    name                                 | gitlab-runner
    version                              | 15.7.3
    revision                             | 914aa415
    platform                             | linux
    architecture                         | amd64
    run_untagged                         | t
    locked                               | t
    access_level                         | 0
    ip_address                           | 172.31.19.34
    maximum_timeout                      | 
    runner_type                          | 1
    token_encrypted                      | <REDACTED>
    public_projects_minutes_cost_factor  | 0
    private_projects_minutes_cost_factor | 1
    config                               | {}
    executor_type                        | 3
    maintainer_note                      | 
    token_expires_at                     | 
    allowed_plans                        | {}
    registration_type                    | 0
    creator_id                           |
    1. Is this old version value inside PostgreSQL used to update the version value shown on the /admin/runners page when a job is picked up by a runner? Ideally from a user experience perspective, the version shown when a runner makes contact to the GitLab instance to poll for jobs, and when a runner takes a job should always display consistently.

    2. There is some suspicion at the moment that the discrepancy is due to an difference between what is stored in the PostgreSQL database vs Redis, although clearing the Redis cache via the rake task does not seem to resolve the problem. The following items are of interest:

Edited by James Reed