Skip to content

HTTP400 with no visibility when permissions on TMPDIR were modified

Summary

During incident gitlab-com/gl-infra/production#5194 (closed) it was discovered that Workhorse was failing with an HTTP400, but we have no information as to what was wrong. No events in Sentry, and no logs indicate why Workhorse was having a problem handling CI artifacts. Example snippet of an HTTP 400 error:

Key Value
json.content_type text/plain
json.correlation_id 01FB39J76J620B47TPYQ89Q84T
json.duration_ms 423
json.host gitlab.com
json.method POST
json.route ^/api/v4/jobs/[0-9]+/artifacts\z
json.status 400
json.ttfb_ms 421
json.type api
json.uri /api/v4/jobs/[REDACTED]/artifacts?artifact_format=zip&artifact_type=archive&expire_in=1+hour
json.user_agent gitlab-runner 13.11.0 (13-11-stable; go1.13.8; linux/amd64)
json.written_bytes 38
kubernetes.container_image dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-workhorse-ee:14-1-202107201616-ec13864a
kubernetes.container_name gitlab-workhorse

Steps to reproduce

This problem was introduced when a change to the Pod's TMPDIR permissions were modified.

Upon startup of Workhorse, a script is run that creates the temporary directory with specific permissions: https://gitlab.com/gitlab-org/build/CNG/-/blob/47988fc90f97b5e1e9dfac54c6a5313d8af1cba3/gitlab-workhorse/scripts/start-workhorse#L9

The change that induced the outage created this directory for us, but the permissions are not modified on Pod startup as the directory would already exist:

gitlab-com/gl-infra/k8s-workloads/gitlab-com!1037 (merged)


The failed permissions on the temporary directory are: drwxrwsrwx - root git

The working permissions on the temporary directory are: drwxrws--T - git git


Example Project

What is the current bug behavior?

HTTP 400's during CI artifact uploads; no logging telling us what actually leads to this failure

What is the expected correct behavior?

No errors during CI artifact uploads

Relevant logs and/or screenshots

See linked incident: gitlab-com/gl-infra/production#5194 (closed)

Possible fixes

It looks like rails may be very particular about directory permissions for its configured temporary directory. See issue gitlab-org/charts/gitlab#1651 (closed)