Skip to content

Inaccurate deploy keys passthrough

Background

As part of &706 (closed), with !300 (merged) we've added a Go implementation for git-receive-pack.

In 12.4, a part of the deploy keys functionality changed, seemingly out of the blue, reported in gitlab#35779 (closed), which also coincides with the removal of gitlab-shell-ruby in #173 (closed).

Summary

Deploy key checks are inconsistent ever since 12.4, here's a small example from testing / a script we used after a Large Premium Customer (ZD, SFDC - internal links) had an emergency after their 12.4.8 upgrade when they ran into this change. (few examples also detailed in the description of the related issue - gitlab#35779 (closed)).

Investigation details

Long, verbose description - expand for details

Started from the assumption that something handling the authorization broke in the 12.3->12.4 diff. Initial info is that api_json.log has user_id for a deploy key, effectively tying the deploy key to the owner, together with the push permissions.

Two setups, same project paths, users and keys, one on 12.3.9, one on 12.4.8.

Starting with a gitaly strace on both nodes, using strace-parser as well:

12.3:

╰─>$ grep "sendto.*127.0.0.1:8080.*POST.*internal/allowed" gitaly.txt -A 1
11785 19:05:22.240274 sendto(6<TCP:[127.0.0.1:36146->127.0.0.1:8080]>, "POST /api/v4/internal/allowed HTTP/1.1\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nAccept: */*\r\nUser-Agent: Ruby\r\nContent-Type: application/x-www-form-urlencoded\r\nHost: 127.0.0.1:8080\r\nContent-Length: 1050\r\n\r\n", 224, MSG_DONTWAIT, NULL, 0) = 224 <0.000542>
11785 19:05:22.240924 sendto(6<TCP:[127.0.0.1:36146->127.0.0.1:8080]>, "action=git-receive-pack&changes=10f18557a4414fa7076bbac2b13c1fdc861543c7+7db02ad3442ed94573b59a3af1f5b4f00197c0c1+refs%2Fheads%2Fmaster%0A&gl_repository=project-1&project=%2Fvar%2Fopt%2Fgitlab%2Fgit-data%2Frepositories%2F%40hashed%2F6b%2F86%2F6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git&protocol=ssh&env=%7B%22GIT_ALTERNATE_OBJECT_DIRECTORIES%22%3A%5B%22%2Fvar%2Fopt%2Fgitlab%2Fgit-data%2Frepositories%2F%40hashed%2F6b%2F86%2F6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git%2F.%2Fobjects%22%5D%2C%22GIT_ALTERNATE_OBJECT_DIRECTORIES_RELATIVE%22%3A%5B%22objects%22%5D%2C%22GIT_OBJECT_DIRECTORY%22%3A%22%2Fvar%2Fopt%2Fgitlab%2Fgit-data%2Frepositories%2F%40hashed%2F6b%2F86%2F6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git%2F.%2Fobjects%2Fincoming-EXauMa%22%2C%22GIT_OBJECT_DIRECTORY_RELATIVE%22%3A%22objects%2Fincoming-EXauMa%22%7D&key_id=1&secret_token=REDACTED"..., 1050, MSG_DONTWAIT, NULL, 0) = 1050 <0.004236>


╰─>$ strace-parser gitaly.txt p 11785 | grep Prog -A1
  Program Executed: /opt/gitlab/embedded/bin/ruby
  Args: ["/opt/gitlab/embedded/service/gitaly-ruby/git-hooks/../gitlab-shell/hooks/pre-receive"] 0x7ffe81ff17c8

12.4:

╰─>$ grep "sendto.*127.0.0.1:8080.*POST.*internal/allowed" gitaly.txt
16047 19:09:15.393904 sendto(6<TCP:[127.0.0.1:40326->127.0.0.1:8080]>, "POST /api/v4/internal/allowed HTTP/1.1\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nAccept: */*\r\nUser-Agent: Ruby\r\nContent-Type: application/x-www-form-urlencoded\r\nHost: 127.0.0.1:8080\r\nContent-Length: 1051\r\n\r\n", 224, MSG_DONTWAIT, NULL, 0) = 224 <0.000595>
16047 19:09:15.394606 sendto(6<TCP:[127.0.0.1:40326->127.0.0.1:8080]>, "action=git-receive-pack&changes=af229d1572c06e9af9dc565b142c9a25e399f70b+f048a6d4cae6a47c0d77daa3a034ff165f8c388d+refs%2Fheads%2Fmaster%0A&gl_repository=project-1&project=%2Fvar%2Fopt%2Fgitlab%2Fgit-data%2Frepositories%2F%40hashed%2F6b%2F86%2F6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git&protocol=ssh&env=%7B%22GIT_ALTERNATE_OBJECT_DIRECTORIES%22%3A%5B%22%2Fvar%2Fopt%2Fgitlab%2Fgit-data%2Frepositories%2F%40hashed%2F6b%2F86%2F6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git%2F.%2Fobjects%22%5D%2C%22GIT_ALTERNATE_OBJECT_DIRECTORIES_RELATIVE%22%3A%5B%22objects%22%5D%2C%22GIT_OBJECT_DIRECTORY%22%3A%22%2Fvar%2Fopt%2Fgitlab%2Fgit-data%2Frepositories%2F%40hashed%2F6b%2F86%2F6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git%2F.%2Fobjects%2Fincoming-QiWlm3%22%2C%22GIT_OBJECT_DIRECTORY_RELATIVE%22%3A%22objects%2Fincoming-QiWlm3%22%7D&user_id=1&secret_token=REDACTED"..., 1051, MSG_DONTWAIT, NULL, 0) = 1051 <0.002014>


╰─>$ strace-parser gitaly.txt p 16047 | grep Prog -A1
  Program Executed: /opt/gitlab/embedded/bin/ruby
  Args: ["/opt/gitlab/embedded/service/gitaly-ruby/gitlab-shell/hooks/pre-receive"] 0x7ffe9a77bfb8

First and most important thing to note, the 12.3 version passed key_id=1 while the 12.4 version passed user_id=1.

The difference in args is also a bit odd, let's see what's the parent:

12.3:

╰─>$ strace-parser gitaly.txt p (strace-parser gitaly.txt p 11785 | grep Parent | awk '{print $3}') | grep Prog -A1
  Program Executed: /opt/gitlab/embedded/bin/git
  Args: ["-c" "core.hooksPath=/opt/gitlab/embedded/service/gitaly-ruby/git-hooks" "-c" "core.alternateRefsCommand=exit 0 #" "receive-pack" "/var/opt/gitlab/git-data/repositories/@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git"] 0xc000110480

12.4:

╰─>$ strace-parser gitaly.txt p (strace-parser gitaly.txt p 16047 | grep Parent | awk '{print $3}') | grep Prog -A1
  Program Executed: /opt/gitlab/embedded/bin/gitaly-hooks
  Args: ["pre-receive"] 0x55bb5fbfdd78

Oh, right. The new gitaly-hooks binary (also introduced in 12.4, pretty suspect to begin with).

The pre-receive hook is simply a symlink to the generic gitlab-shell-hook:

root@cat-124:/# ll /opt/gitlab/embedded/service/gitaly-ruby/git-hooks/pre-receive
lrwxrwxrwx 1 root root 17 Jan 13  2020 /opt/gitlab/embedded/service/gitaly-ruby/git-hooks/pre-receive -> gitlab-shell-hook*

Confirming whether or not the new gitaly-hooks binary is the cause here:

root@cat-124:/# sed -i.bak 's/^if\ .*/if false; then/' /opt/gitlab/embedded/service/gitaly-ruby/git-hooks/gitlab-shell-hook

Nothing changes when not using it:

root@cat-124:/# grep "\bkey_id=" /tmp/gitaly2.txt # new strace, after disabling
root@cat-124:/#

At least now we're comparing apples to apples, but it's still unclear why the difference.

Let's try stracing gitlab-shell. Note: do not trace sshd while connected to it, it'll crash your server/vm (not that it happened to me... 😆)

A simple approach would be to add a wrapper which we can attach to, and use -f to also strace into the child processes.

root@cat-123:~# mv /opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell{,.old}
root@cat-123:~# cat > /opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell <<EOF
> #!/bin/bash
> sleep 20
> /opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell.old "\$@"
> EOF
root@cat-123:~# chmod +x /opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell

Run a push, attach to the gitlab-shell wrapper while it's in sleep with:

ps auwx | grep gitlab-shell | awk '{ print " -p " $2}' | xargs strace -tt -T -f -y -yy -s 1024 -o /tmp/gitlab-shell.txt

Let's see what happens now with our key-1:

12.3:

root@cat-123:/# grep "key-1" /tmp/gitlab-shell.txt
6139  21:22:11.728727 execve("/opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell.old", ["/opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell.old", "key-1"], 0x55b9a992dc10 /* 15 vars */) = 0 <0.000226>
6139  21:22:11.747444 execve("/opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell-ruby", ["/opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell-ruby", "key-1"], 0xc0000d8200 /* 15 vars */ <unfinished ...>

12.4:

root@cat-124:/# grep "key-1" /tmp/gitlab-shell.txt
8671  21:28:08.159239 execve("/opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell.old", ["/opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell.old", "key-1"], 0x556c43014c10 /* 15 vars */) = 0 <0.000256>

Uh-oh, there's no more gitlab-shell-ruby because of !346 (merged). Comparing the IO with strace-parser, we see that the new go binary actually sends the gitaly gRPC with user-1 instead of key-1 on the 12.4 version:

8671  21:28:08.214993 write(3<UNIX:[802333->802334]>, "\0\0\0\4\1\0\0\0\0\0\0a\1\4\0\0\0\1\203\206E\231bc$tz_v\354{\213gs\20\254n\335\217iHSw.\261\223\257A\236\266\243y\270\307q\330\301\353K\23\0314\0166&2GG\246&2GG\245\320ru*\177_\213\35u\320b\r&=LMedz\212\232\312\310\264\307`+\211\245\301@\2te\206M\2035\5\261\37\0\0\224\0\0\0\0\0\1\0\0\0\0\217\nt\22\7default\32R@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git2\tproject-1B\nroot/test1\32\6user-1\"\tproject-1*\4root", 272) = 2* 72 <0.000028>

However... just a bit earlier, gitlab-shell does send a check with key-1 as intended:

8673  21:28:08.181602 write(3<TCP:[127.0.0.1:42488->127.0.0.1:8080]>, "POST /api/v4/internal/allowed HTTP/1.1\r\nHost: 127.0.0.1:8080\r\nUser-Agent: Go-http-client/1.1\r\nConnection: close\r\nContent-Length: 126\r\nContent-Type: application/json\r\nGitlab-Shared-Secret: REDACTED\r\nAccept-Encoding: gzip\r\n\r\n{\"action\":\"git-receive-pack\",\"project\":\"root/test1.git\",\"changes\":\"_any\",\"protocol\":\"ssh\",\"key_id\":\"1\",\"check_ip\":\"127.0.0.1\"}", 518 <unfinished ...>

Whats going on? OH! Is it possible we're using the response from the internal/allowed check in our request to gitaly (which then does another allowed check, but with user_id instead of key_id)?

Yes, certainly seems like, we seem to be reusing the response from the allowed check for the gitaly request.

Are we sure that's not a difference caused by a possible API difference between versions? Yes:

12.3:

root@cat-123:/# printf "POST /api/v4/internal/allowed HTTP/1.1\r\nHost: 127.0.0.1:8080\r\nUser-Agent: Go-http-client/1.1\r\nConnection: close\r\nContent-Length: 126\r\nContent-Type: application/json\r\nGitlab-Shared-Secret: $token\r\nAccept-Encoding: gzip\r\n\r\n{\"action\":\"git-receive-pack\",\"project\":\"root/test1.git\",\"changes\":\"_any\",\"protocol\":\"ssh\",\"key_id\":\"1\",\"check_ip\":\"127.0.0.1\"}" | nc 127.0.0.1 8080 | tail -1 | jq .gl_id
"user-1"

12.4:

root@cat-124:/# printf "POST /api/v4/internal/allowed HTTP/1.1\r\nHost: 127.0.0.1:8080\r\nUser-Agent: Go-http-client/1.1\r\nConnection: close\r\nContent-Length: 126\r\nContent-Type: application/json\r\nGitlab-Shared-Secret: $token\r\nAccept-Encoding: gzip\r\n\r\n{\"action\":\"git-receive-pack\",\"project\":\"root/test1.git\",\"changes\":\"_any\",\"protocol\":\"ssh\",\"key_id\":\"1\",\"check_ip\":\"127.0.0.1\"}" | nc 127.0.0.1 8080 | tail -1 | jq .gl_id
"user-1"

Since 12.4 we reuse the response from the internal API like mentioned above, which is our problem, in 12.3, the behavior was a bit different, only if the authorized key type was user-* we'd use the response, otherwise we'd set directly the gl_id.

We do seem to have a parse that seems like it should do the job, but there's a plot twist, it doesn't matter - we pass response.UserId in the gitaly call in the receive-pack. upload-packs are not affected, because we don't send any details there.

Problem

The current flow for an upload-pack request is (diagram inspired from the gitaly-ssh readme):

sequenceDiagram
  participant SSHClient as User's SSH Client
  participant SSHD as GitLab SSHD
  participant GitLabShell as gitlab-shell
  participant InternalAPI as Internal API
  participant GitalyServer as Gitaly
  participant GitalyGit as git upload-pack

  SSHClient ->> SSHD: SSH session

  SSHD ->> GitLabShell: spawns gitlab-shell
  GitLabShell ->> InternalAPI: permission check for key-<ID> or user-<ID>
  InternalAPI ->> GitLabShell: response with user-<ID>
  GitLabShell ->> GitalyServer: gRPC SSHUploadPack using user-<ID>
  GitalyServer ->> InternalAPI: permission check for user-<ID>

  GitalyServer ->> GitalyGit: spawns git upload-pack

The problem is twofold, the initial part is generic for both upload-pack and receive-pack flows.

In gitlab.com/gitlab-org/gitlab-shell/internal/command/{receivepack,uploadpack}.(*Command).Execute we call (*Command).verifyAccess, and use the response to call (*Command).performGitalyCall.

The verification function does set the user-id or key-id accordingly, and it parses the response at the end, setting the response.Who field accordingly, if passed a key id, it uses the key-id, otherwise it uses the user-id.

What differs is, the performGitalyCall of those two flows.

  1. For receive-packs, the problem is that we're passing response.UserId to the SSHReceivePackRequest, which means it'll always be a user, so gitaly will not know it came through a deploy key, and run the user- permissions checks (when it calls the internal API in its turn). That causes gitlab#35779 (closed), as the check is now effectively operating on the user that is associated with the key in the database (the one who created it, mainly).

  2. For upload-packs, we don't pass either of the key-id or user-id, so the change didn't make a difference here.

Edited by Catalin Irimie