Inaccurate deploy keys passthrough
Background
As part of &706 (closed), with !300 (merged) we've added a Go implementation for git-receive-pack
.
In 12.4, a part of the deploy keys functionality changed, seemingly out of the blue, reported in gitlab#35779 (closed), which also coincides with the removal of gitlab-shell-ruby in #173 (closed).
Summary
Deploy key checks are inconsistent ever since 12.4, here's a small example from testing / a script we used after a Large Premium Customer (ZD, SFDC - internal links) had an emergency after their 12.4.8 upgrade when they ran into this change. (few examples also detailed in the description of the related issue - gitlab#35779 (closed)).
Investigation details
Long, verbose description - expand for details
Started from the assumption that something handling the authorization broke in the 12.3->12.4 diff. Initial info is that api_json.log
has user_id
for a deploy key, effectively tying the deploy key to the owner, together with the push permissions.
Two setups, same project paths, users and keys, one on 12.3.9, one on 12.4.8.
Starting with a gitaly strace on both nodes, using strace-parser as well:
12.3:
╰─>$ grep "sendto.*127.0.0.1:8080.*POST.*internal/allowed" gitaly.txt -A 1
11785 19:05:22.240274 sendto(6<TCP:[127.0.0.1:36146->127.0.0.1:8080]>, "POST /api/v4/internal/allowed HTTP/1.1\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nAccept: */*\r\nUser-Agent: Ruby\r\nContent-Type: application/x-www-form-urlencoded\r\nHost: 127.0.0.1:8080\r\nContent-Length: 1050\r\n\r\n", 224, MSG_DONTWAIT, NULL, 0) = 224 <0.000542>
11785 19:05:22.240924 sendto(6<TCP:[127.0.0.1:36146->127.0.0.1:8080]>, "action=git-receive-pack&changes=10f18557a4414fa7076bbac2b13c1fdc861543c7+7db02ad3442ed94573b59a3af1f5b4f00197c0c1+refs%2Fheads%2Fmaster%0A&gl_repository=project-1&project=%2Fvar%2Fopt%2Fgitlab%2Fgit-data%2Frepositories%2F%40hashed%2F6b%2F86%2F6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git&protocol=ssh&env=%7B%22GIT_ALTERNATE_OBJECT_DIRECTORIES%22%3A%5B%22%2Fvar%2Fopt%2Fgitlab%2Fgit-data%2Frepositories%2F%40hashed%2F6b%2F86%2F6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git%2F.%2Fobjects%22%5D%2C%22GIT_ALTERNATE_OBJECT_DIRECTORIES_RELATIVE%22%3A%5B%22objects%22%5D%2C%22GIT_OBJECT_DIRECTORY%22%3A%22%2Fvar%2Fopt%2Fgitlab%2Fgit-data%2Frepositories%2F%40hashed%2F6b%2F86%2F6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git%2F.%2Fobjects%2Fincoming-EXauMa%22%2C%22GIT_OBJECT_DIRECTORY_RELATIVE%22%3A%22objects%2Fincoming-EXauMa%22%7D&key_id=1&secret_token=REDACTED"..., 1050, MSG_DONTWAIT, NULL, 0) = 1050 <0.004236>
╰─>$ strace-parser gitaly.txt p 11785 | grep Prog -A1
Program Executed: /opt/gitlab/embedded/bin/ruby
Args: ["/opt/gitlab/embedded/service/gitaly-ruby/git-hooks/../gitlab-shell/hooks/pre-receive"] 0x7ffe81ff17c8
12.4:
╰─>$ grep "sendto.*127.0.0.1:8080.*POST.*internal/allowed" gitaly.txt
16047 19:09:15.393904 sendto(6<TCP:[127.0.0.1:40326->127.0.0.1:8080]>, "POST /api/v4/internal/allowed HTTP/1.1\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nAccept: */*\r\nUser-Agent: Ruby\r\nContent-Type: application/x-www-form-urlencoded\r\nHost: 127.0.0.1:8080\r\nContent-Length: 1051\r\n\r\n", 224, MSG_DONTWAIT, NULL, 0) = 224 <0.000595>
16047 19:09:15.394606 sendto(6<TCP:[127.0.0.1:40326->127.0.0.1:8080]>, "action=git-receive-pack&changes=af229d1572c06e9af9dc565b142c9a25e399f70b+f048a6d4cae6a47c0d77daa3a034ff165f8c388d+refs%2Fheads%2Fmaster%0A&gl_repository=project-1&project=%2Fvar%2Fopt%2Fgitlab%2Fgit-data%2Frepositories%2F%40hashed%2F6b%2F86%2F6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git&protocol=ssh&env=%7B%22GIT_ALTERNATE_OBJECT_DIRECTORIES%22%3A%5B%22%2Fvar%2Fopt%2Fgitlab%2Fgit-data%2Frepositories%2F%40hashed%2F6b%2F86%2F6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git%2F.%2Fobjects%22%5D%2C%22GIT_ALTERNATE_OBJECT_DIRECTORIES_RELATIVE%22%3A%5B%22objects%22%5D%2C%22GIT_OBJECT_DIRECTORY%22%3A%22%2Fvar%2Fopt%2Fgitlab%2Fgit-data%2Frepositories%2F%40hashed%2F6b%2F86%2F6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git%2F.%2Fobjects%2Fincoming-QiWlm3%22%2C%22GIT_OBJECT_DIRECTORY_RELATIVE%22%3A%22objects%2Fincoming-QiWlm3%22%7D&user_id=1&secret_token=REDACTED"..., 1051, MSG_DONTWAIT, NULL, 0) = 1051 <0.002014>
╰─>$ strace-parser gitaly.txt p 16047 | grep Prog -A1
Program Executed: /opt/gitlab/embedded/bin/ruby
Args: ["/opt/gitlab/embedded/service/gitaly-ruby/gitlab-shell/hooks/pre-receive"] 0x7ffe9a77bfb8
First and most important thing to note, the 12.3 version passed key_id=1
while the 12.4 version passed user_id=1
.
The difference in args is also a bit odd, let's see what's the parent:
12.3:
╰─>$ strace-parser gitaly.txt p (strace-parser gitaly.txt p 11785 | grep Parent | awk '{print $3}') | grep Prog -A1
Program Executed: /opt/gitlab/embedded/bin/git
Args: ["-c" "core.hooksPath=/opt/gitlab/embedded/service/gitaly-ruby/git-hooks" "-c" "core.alternateRefsCommand=exit 0 #" "receive-pack" "/var/opt/gitlab/git-data/repositories/@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git"] 0xc000110480
12.4:
╰─>$ strace-parser gitaly.txt p (strace-parser gitaly.txt p 16047 | grep Parent | awk '{print $3}') | grep Prog -A1
Program Executed: /opt/gitlab/embedded/bin/gitaly-hooks
Args: ["pre-receive"] 0x55bb5fbfdd78
Oh, right. The new gitaly-hooks
binary (also introduced in 12.4, pretty suspect to begin with).
The pre-receive
hook is simply a symlink to the generic gitlab-shell-hook
:
root@cat-124:/# ll /opt/gitlab/embedded/service/gitaly-ruby/git-hooks/pre-receive
lrwxrwxrwx 1 root root 17 Jan 13 2020 /opt/gitlab/embedded/service/gitaly-ruby/git-hooks/pre-receive -> gitlab-shell-hook*
Confirming whether or not the new gitaly-hooks
binary is the cause here:
root@cat-124:/# sed -i.bak 's/^if\ .*/if false; then/' /opt/gitlab/embedded/service/gitaly-ruby/git-hooks/gitlab-shell-hook
Nothing changes when not using it:
root@cat-124:/# grep "\bkey_id=" /tmp/gitaly2.txt # new strace, after disabling
root@cat-124:/#
At least now we're comparing apples to apples, but it's still unclear why the difference.
Let's try stracing gitlab-shell. Note: do not trace sshd while connected to it, it'll crash your server/vm (not that it happened to me...
A simple approach would be to add a wrapper which we can attach to, and use -f
to also strace into the child processes.
root@cat-123:~# mv /opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell{,.old}
root@cat-123:~# cat > /opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell <<EOF
> #!/bin/bash
> sleep 20
> /opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell.old "\$@"
> EOF
root@cat-123:~# chmod +x /opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell
Run a push, attach to the gitlab-shell wrapper while it's in sleep with:
ps auwx | grep gitlab-shell | awk '{ print " -p " $2}' | xargs strace -tt -T -f -y -yy -s 1024 -o /tmp/gitlab-shell.txt
Let's see what happens now with our key-1
:
12.3:
root@cat-123:/# grep "key-1" /tmp/gitlab-shell.txt
6139 21:22:11.728727 execve("/opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell.old", ["/opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell.old", "key-1"], 0x55b9a992dc10 /* 15 vars */) = 0 <0.000226>
6139 21:22:11.747444 execve("/opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell-ruby", ["/opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell-ruby", "key-1"], 0xc0000d8200 /* 15 vars */ <unfinished ...>
12.4:
root@cat-124:/# grep "key-1" /tmp/gitlab-shell.txt
8671 21:28:08.159239 execve("/opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell.old", ["/opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell.old", "key-1"], 0x556c43014c10 /* 15 vars */) = 0 <0.000256>
Uh-oh, there's no more gitlab-shell-ruby because of !346 (merged). Comparing the IO with strace-parser, we see that the new go binary actually sends the gitaly gRPC with user-1
instead of key-1
on the 12.4 version:
8671 21:28:08.214993 write(3<UNIX:[802333->802334]>, "\0\0\0\4\1\0\0\0\0\0\0a\1\4\0\0\0\1\203\206E\231bc$tz_v\354{\213gs\20\254n\335\217iHSw.\261\223\257A\236\266\243y\270\307q\330\301\353K\23\0314\0166&2GG\246&2GG\245\320ru*\177_\213\35u\320b\r&=LMedz\212\232\312\310\264\307`+\211\245\301@\2te\206M\2035\5\261\37\0\0\224\0\0\0\0\0\1\0\0\0\0\217\nt\22\7default\32R@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git2\tproject-1B\nroot/test1\32\6user-1\"\tproject-1*\4root", 272) = 2* 72 <0.000028>
However... just a bit earlier, gitlab-shell does send a check with key-1
as intended:
8673 21:28:08.181602 write(3<TCP:[127.0.0.1:42488->127.0.0.1:8080]>, "POST /api/v4/internal/allowed HTTP/1.1\r\nHost: 127.0.0.1:8080\r\nUser-Agent: Go-http-client/1.1\r\nConnection: close\r\nContent-Length: 126\r\nContent-Type: application/json\r\nGitlab-Shared-Secret: REDACTED\r\nAccept-Encoding: gzip\r\n\r\n{\"action\":\"git-receive-pack\",\"project\":\"root/test1.git\",\"changes\":\"_any\",\"protocol\":\"ssh\",\"key_id\":\"1\",\"check_ip\":\"127.0.0.1\"}", 518 <unfinished ...>
Whats going on? OH! Is it possible we're using the response from the internal/allowed
check in our request to gitaly (which then does another allowed check, but with user_id instead of key_id)?
Yes, certainly seems like, we seem to be reusing the response from the allowed check for the gitaly request.
Are we sure that's not a difference caused by a possible API difference between versions? Yes:
12.3:
root@cat-123:/# printf "POST /api/v4/internal/allowed HTTP/1.1\r\nHost: 127.0.0.1:8080\r\nUser-Agent: Go-http-client/1.1\r\nConnection: close\r\nContent-Length: 126\r\nContent-Type: application/json\r\nGitlab-Shared-Secret: $token\r\nAccept-Encoding: gzip\r\n\r\n{\"action\":\"git-receive-pack\",\"project\":\"root/test1.git\",\"changes\":\"_any\",\"protocol\":\"ssh\",\"key_id\":\"1\",\"check_ip\":\"127.0.0.1\"}" | nc 127.0.0.1 8080 | tail -1 | jq .gl_id
"user-1"
12.4:
root@cat-124:/# printf "POST /api/v4/internal/allowed HTTP/1.1\r\nHost: 127.0.0.1:8080\r\nUser-Agent: Go-http-client/1.1\r\nConnection: close\r\nContent-Length: 126\r\nContent-Type: application/json\r\nGitlab-Shared-Secret: $token\r\nAccept-Encoding: gzip\r\n\r\n{\"action\":\"git-receive-pack\",\"project\":\"root/test1.git\",\"changes\":\"_any\",\"protocol\":\"ssh\",\"key_id\":\"1\",\"check_ip\":\"127.0.0.1\"}" | nc 127.0.0.1 8080 | tail -1 | jq .gl_id
"user-1"
Since 12.4 we reuse the response from the internal API like mentioned above, which is our problem, in 12.3, the behavior was a bit different, only if the authorized key type was user-*
we'd use the response, otherwise we'd set directly the gl_id
.
We do seem to have a parse
that seems like it should do the job, but there's a plot twist, it doesn't matter - we pass response.UserId
in the gitaly call in the receive-pack. upload-packs are not affected, because we don't send any details there.
Problem
The current flow for an upload-pack
request is (diagram inspired from the gitaly-ssh readme):
sequenceDiagram
participant SSHClient as User's SSH Client
participant SSHD as GitLab SSHD
participant GitLabShell as gitlab-shell
participant InternalAPI as Internal API
participant GitalyServer as Gitaly
participant GitalyGit as git upload-pack
SSHClient ->> SSHD: SSH session
SSHD ->> GitLabShell: spawns gitlab-shell
GitLabShell ->> InternalAPI: permission check for key-<ID> or user-<ID>
InternalAPI ->> GitLabShell: response with user-<ID>
GitLabShell ->> GitalyServer: gRPC SSHUploadPack using user-<ID>
GitalyServer ->> InternalAPI: permission check for user-<ID>
GitalyServer ->> GitalyGit: spawns git upload-pack
The problem is twofold, the initial part is generic for both upload-pack
and receive-pack
flows.
In gitlab.com/gitlab-org/gitlab-shell/internal/command/{receivepack,uploadpack}.(*Command).Execute
we call (*Command).verifyAccess
, and use the response to call (*Command).performGitalyCall
.
The verification function does set the user-id or key-id accordingly, and it parses the response at the end, setting the response.Who
field accordingly, if passed a key id, it uses the key-id, otherwise it uses the user-id.
What differs is, the performGitalyCall
of those two flows.
-
For
receive-pack
s, the problem is that we're passingresponse.UserId
to theSSHReceivePackRequest
, which means it'll always be a user, so gitaly will not know it came through a deploy key, and run the user- permissions checks (when it calls the internal API in its turn). That causes gitlab#35779 (closed), as the check is now effectively operating on the user that is associated with the key in the database (the one who created it, mainly). -
For
upload-pack
s, we don't pass either of thekey-id
oruser-id
, so the change didn't make a difference here.