Geo: Perform Git pull over SSH from secondary as an HTTP request to primary
Before
When a user performs a Git pull/clone over SSH request to secondary node, Gitlab Shell performs requests to secondary Workhorse & Rails which performs a Git over HTTP request to primary.
sequenceDiagram
participant C as Git on client
participant S as GitLab Shell
participant I as Workhorse & Rails
participant P as Workhorse & Rails
Note left of C: git pull/clone
Note over S,I: Secondary site
Note over P: Primary site
C->>+S: ssh git upload-pack request
S->>I: SSH key validation (api/v4/internal/authorized_keys?key=..)
I-->>S: HTTP/1.1 300 (custom action status) with {endpoint, msg, primary_repo}
S->>I: POST /api/v4/geo/proxy_git_ssh/info_refs_upload_pack
I->>P: POST $PRIMARY/foo/bar.git/info/refs/?service=git-upload-pack
P-->>I: HTTP/1.1 200 OK
I-->>S: <response>
S-->>C: return Git response from primary
C-->>S: stream Git data to pull
S->>I: POST /api/v4/geo/proxy_git_ssh/upload_pack
I->>P: POST $PRIMARY/foo/bar.git/git-upload-pack
P-->>I: HTTP/1.1 200 OK
I-->>S: <response>
S-->>-C: gitlab-shell upload-pack response
After
When a user performs a Git pull/clone over SSH request to secondary node, Gitlab Shell performs Git over HTTP requests to primary Workhorse & Rails.
sequenceDiagram
participant C as Git on client
participant S as GitLab Shell
participant I as Workhorse & Rails
participant P as Workhorse & Rails
Note left of C: git pull/clone
Note over S,I: Secondary site
Note over P: Primary site
C->>+S: ssh git upload-pack request
S->>I: SSH key validation (api/v4/internal/authorized_keys?key=..)
I-->>S: HTTP/1.1 300 (custom action status) with {endpoint, msg, primary_repo, authorization headers}
S->>P: POST $PRIMARY/foo/bar.git/info/refs/?service=git-upload-pack
P-->>S: HTTP/1.1 200 OK
P-->>S: <response>
S-->>C: return Git response from primary
C-->>S: stream Git data to push
S->>P: POST $PRIMARY/foo/bar.git/git-upload-pack
P-->>S: HTTP/1.1 200 OK
P-->>S: <response>
S-->>-C: gitlab-shell upload-pack response
Problem
As we've agreed in Geo: Proxy Git push over SSH via Workhorse, not... (gitlab#387568 - closed) it makes sense to move the Git over HTTP requests to Primary to Gitlab Shell to avoid proxying large blobs of data via Gitlab Rails. Rails provides authorization headers: Geo: Send authorization headers to Gitlab Shell... (gitlab#390101 - closed), Gitlab Shell needs to perform Git pull/clone over HTTP request and send the response to user.
Proposal
This issue is very similar to Geo: Perform Git over HTTP requests as a custom... (#614 - closed). That issue optimizes Git Push, while this issue tends to implement the same approach for Git Pull
Instead of performing proxy HTTP requests, let's perform Git over HTTPS request straight to Primary node. Gitlab Rails may return a feature flag, let's take it into account to be able to switch between behaviors.
Solution
The solution is similar to the one provided in Perform Git over HTTP request to primary repo (!716 - merged).
- We need to modify internal/command/uploadpack/uploadpack.go and call a command similar to this one. It probably should be controlled by a different feature flag.
- A separate
PullCommandshould be implemented, similar toPushCommand. The info-refs and git-upload-pack responses should be converted according to this logic. It's similar to the following transformation (Rails logic -> Gitlab Shell logic) - HTTP client for info-refs can be reused, an additional one for upload-pack should be implemented similar to receive-pack
Availability and Testing
To test the solution locally, Geo can be setup in GDK: https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/72a913f8fbdda624c8f0c9572e3c61d0e14b9e19/doc/howto/geo.md
Regression testing, please ensure associated MR is labelled with ~"pipeline:run-all-e2e" and e2e:package-and-test job is passing.