Skip to content

Geo: Perform Git pull over SSH from secondary as an HTTP request to primary

Before

When a user performs a Git pull/clone over SSH request to secondary node, Gitlab Shell performs requests to secondary Workhorse & Rails which performs a Git over HTTP request to primary.

sequenceDiagram
    participant C as Git on client
    participant S as GitLab Shell
    participant I as Workhorse & Rails
    participant P as Workhorse & Rails

    Note left of C: git pull/clone
    Note over S,I: Secondary site
    Note over P: Primary site
    C->>+S: ssh git upload-pack request
    S->>I: SSH key validation (api/v4/internal/authorized_keys?key=..)
    I-->>S: HTTP/1.1 300 (custom action status) with {endpoint, msg, primary_repo}
    S->>I: POST /api/v4/geo/proxy_git_ssh/info_refs_upload_pack
    I->>P: POST $PRIMARY/foo/bar.git/info/refs/?service=git-upload-pack
    P-->>I: HTTP/1.1 200 OK
    I-->>S: <response>
    S-->>C: return Git response from primary
    C-->>S: stream Git data to pull
    S->>I: POST /api/v4/geo/proxy_git_ssh/upload_pack
    I->>P: POST $PRIMARY/foo/bar.git/git-upload-pack
    P-->>I: HTTP/1.1 200 OK
    I-->>S: <response>
    S-->>-C: gitlab-shell upload-pack response

After

When a user performs a Git pull/clone over SSH request to secondary node, Gitlab Shell performs Git over HTTP requests to primary Workhorse & Rails.

sequenceDiagram
    participant C as Git on client
    participant S as GitLab Shell
    participant I as Workhorse & Rails
    participant P as Workhorse & Rails

    Note left of C: git pull/clone
    Note over S,I: Secondary site
    Note over P: Primary site
    C->>+S: ssh git upload-pack request
    S->>I: SSH key validation (api/v4/internal/authorized_keys?key=..)
    I-->>S: HTTP/1.1 300 (custom action status) with {endpoint, msg, primary_repo, authorization headers}
    S->>P: POST $PRIMARY/foo/bar.git/info/refs/?service=git-upload-pack
    P-->>S: HTTP/1.1 200 OK
    P-->>S: <response>
    S-->>C: return Git response from primary
    C-->>S: stream Git data to push
    S->>P: POST $PRIMARY/foo/bar.git/git-upload-pack
    P-->>S: HTTP/1.1 200 OK
    P-->>S: <response>
    S-->>-C: gitlab-shell upload-pack response

Problem

As we've agreed in Geo: Proxy Git push over SSH via Workhorse, not... (gitlab#387568 - closed) it makes sense to move the Git over HTTP requests to Primary to Gitlab Shell to avoid proxying large blobs of data via Gitlab Rails. Rails provides authorization headers: Geo: Send authorization headers to Gitlab Shell... (gitlab#390101 - closed), Gitlab Shell needs to perform Git pull/clone over HTTP request and send the response to user.

Proposal

This issue is very similar to Geo: Perform Git over HTTP requests as a custom... (#614 - closed). That issue optimizes Git Push, while this issue tends to implement the same approach for Git Pull

Instead of performing proxy HTTP requests, let's perform Git over HTTPS request straight to Primary node. Gitlab Rails may return a feature flag, let's take it into account to be able to switch between behaviors.

Solution

The solution is similar to the one provided in Perform Git over HTTP request to primary repo (!716 - merged).

Availability and Testing

To test the solution locally, Geo can be setup in GDK: https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/72a913f8fbdda624c8f0c9572e3c61d0e14b9e19/doc/howto/geo.md

Regression testing, please ensure associated MR is labelled with ~"pipeline:run-all-e2e" and e2e:package-and-test job is passing.

Edited by Igor Drozdov