Skip to content

Ignore SIGQUIT for duration of artifact upload

Julien Lecomte requested to merge (removed):master-sigquit-fix into master

What does this MR do?

It fixes, probably in an ugly way, issue "#4239 (closed)", and only the uploading artifacts issue. No care has been taken to see if the core also happens in the upload/download cache parts, or artifact download part.

Why was this MR needed?

Because it's labeled "Won't fix" (actually it labeled "Accepting merge requests" /end passive aggressiveness)

More specifically, I haven't dived into the gitlab-runner's internals, but I believe the signal catching could be done elsewhere once and for all. In the meanwhile, this at least fixes my issue that blocks 80 of my coworkers.

I don't mind keeping a fork of the gitlab-runner if you chose never to merge this, but I'd prefer not to.

How to test

Docker image creation

The docker image helper was generated manually by doing the following, as the CI doesn't push an image by default.

# Get the binary from this MR's artifacts:
wget https://gitlab.com/jlecomte/forks/gitlab-runner/-/jobs/463128801/artifacts/raw/out/helper-images/prebuilt-x86_64.tar.xz?inline=false -O prebuilt-x86_64.tar.xz
tar -xf prebuilt-x86_64.tar.xz usr/bin/gitlab-runner-helper
 
docker build -t julienlecomte/gitlab-runner-helper:x86_64-latest .
docker push julienlecomte/gitlab-runner-helper:x86_64-latest

Dockerfile:

FROM gitlab/gitlab-runner-helper:x86_64-latest
RUN rm -v /usr/bin/gitlab-runner-helper
COPY usr/bin/gitlab-runner-helper  /usr/bin/gitlab-runner-helper

The .gitlab-ci.yml

sigquit-fix:
  image: alpine
  stage: build
  script:
    - dd if=/dev/zero of=5g count=$((5*1024)) bs=1048576
    - echo sleeping...
    - sleep 1m
    - echo starting...
  artifacts:
    paths:
      - 5g

The Runner (part 1, the unfixed one)

concurrent = 1
check_interval = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "sigquit-fix"
  url = "https://gitlab.example.com/"
  token = "xyz"
  executor = "docker"
  environment = ["DOCKER_TLS_CERTDIR="]
  [runners.custom_build_dir]
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]
  [runners.docker]
    tls_verify = false
    image = "alpine"
    privileged = true
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = true
    shm_size = 0

Part 1, The Crash & Core

  • Start the CI pipeline with that job and monitor the output of the job in the GitLab UI.

  • Once you see "sleeping...", as root in the terminal of your runner do: while true; do pkill -QUIT gitlab-runner ; sleep 1s ; done

  • Wait less than a minute and watch it crash.

Part 2.a

  • Edit the runner /etc/gitlab-runner/config.toml, and add the line helper_image = "julienlecomte/gitlab-runner-helper:x86_64-latest".

  • Restart the runner: gitlab-runner restart

Part 2.b No Crash & No Core

  • Start the CI pipeline with that job and monitor the output of the job in the GitLab UI.

  • Once you see "sleeping...", as root in the terminal of your runner do: while true; do pkill -QUIT gitlab-runner ; sleep 1s ; done

  • Wait less than a minute and watch it upload a 5G artifact onto your server.

Does this MR meet the acceptance criteria?

  • Documentation created/updated
  • Added tests for this feature/bug
  • In case of conflicts with master - branch was rebased

What are the relevant issue numbers?

#4239 (closed)

Edited by Julien Lecomte

Merge request reports