Skip to content

System-default SSL_CERT_DIR is being used during remote Gitaly over TLS

Based on a customer issue (internal), it appears that when gitlab-shell is failing to pull certs from GitLab's trusted-certs directory, instead using system default, causing x509 errors.

A git pull when using a custom cert:

remote: 
remote: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority"
remote: 

The CA cert is correctly in /etc/gitlab/trusted-certs, and gitlab:gitaly:check returns successful on both GitLab host and Gitaly host.

Checking GitLab Shell ...

GitLab Shell: ... GitLab Shell version >= 12.2.0 ? ... OK (12.2.0)
Running /opt/gitlab/embedded/service/gitlab-shell/bin/check
Internal API available: OK
Redis available via internal API: OK
gitlab-shell self-check successful

Checking GitLab Shell ... Finished

Gitaly check returns ok:

gitlab-rake gitlab:gitaly:check
Checking Gitaly ...

Gitaly: ... default ... OK

Checking Gitaly ... Finished

It was observed that by moving the gitlab-shell binary and creating a wrapper script that exports SSL_CERT_DIR, the clones are then successful:

mv /opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell{,-orig}
cat <<EOF > /opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell
#!/bin/bash
EXPORT SSL_CERT_DIR=/opt/gitlab/embedded/ssl/certs
/opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell-orig $@
EOF
chmod +x /opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell

Prior to the wrapper script, tracing gitlab-shellshows it opening the standard certs directories:

strace.log
31844 openat(AT_FDCWD, "/etc/ssl/certs/ca-certificates.crt", O_RDONLY|O_CLOEXEC <unfinished ...>
31844 <... openat resumed>) = -1 ENOENT (No such file or directory)
31844 openat(AT_FDCWD, "/etc/pki/tls/certs/ca-bundle.crt", O_RDONLY|O_CLOEXEC <unfinished ...>
31844 <... openat resumed>) = 3
31849 openat(AT_FDCWD, "/etc/ssl/certs", O_RDONLY|O_CLOEXEC) = 3
31849 openat(AT_FDCWD, "/etc/ssl/certs/Makefile", O_RDONLY|O_CLOEXEC) = 3
31849 openat(AT_FDCWD, "/etc/ssl/certs/ca-bundle.crt", O_RDONLY|O_CLOEXEC) = 3
31841 openat(AT_FDCWD, "/etc/ssl/certs/ca-bundle.trust.crt", O_RDONLY|O_CLOEXEC) = 3
31841 openat(AT_FDCWD, "/etc/ssl/certs/make-dummy-cert", O_RDONLY|O_CLOEXEC) = 3
31841 openat(AT_FDCWD, "/etc/ssl/certs/renew-dummy-cert", O_RDONLY|O_CLOEXEC) = 3
31841 openat(AT_FDCWD, "/etc//localtime", O_RDONLY) = 3

When wrapping shell and exporting the cert dir:

strace.log
362 openat(AT_FDCWD, "/etc/ssl/certs/ca-certificates.crt", O_RDONLY|O_CLOEXEC <unfinished ...>
362 openat(AT_FDCWD, "/opt/gitlab/embedded/ssl/certs", O_RDONLY|O_CLOEXEC) = 3
362 openat(AT_FDCWD, "/opt/gitlab/embedded/ssl/certs/09789157.0", O_RDONLY|O_CLOEXEC <unfinished ...>
362 openat(AT_FDCWD, "/opt/gitlab/embedded/ssl/certs/1568f5bb.0", O_RDONLY|O_CLOEXEC) = 3
362 openat(AT_FDCWD, "/opt/gitlab/embedded/ssl/certs/1895e586.0", O_RDONLY|O_CLOEXEC) = 3

I have observed this occurring on gitlab 12.10.3-ee; customer reports 12.9.5-ee.

According to the crypto/x509 docs, the environment variables "SSL_CERT_FILE and SSL_CERT_DIR can be used to override the system default locations". I suspect that's at play somehow.

Further, setting any of ca_file, ca_path, or self_signed_cert under the http_settings in /var/opt/gitlab/gitlab-shell/config.yml has no effect.

The workaround for now is leaving the bash wrapper script in place which allows for successful clones from remote Gitaly instance.

Edited by Keven Hughes