puma performance for git ssh and git https

Summary

During the git-https and git-ssh migration to Kubernetes we spent some time looking at how these two similar types of workloads interface to puma.

The call flow for git https and git ssh is described in https://docs.gitlab.com/ee/development/architecture.html#web-request-80443

For a git fetch over ssh:

  • /api/v4/internal/allowed ~ 700 req/sec
  • /api/v4/internal/authorized_keys ~ 1k req/sec

For a git fetch of https:

  • Repositories::GitHttpController#info_refs ~ 700 req/sec
  • Repositories::GitHttpController#git_upload_pack ~ 150 req/sec

The rate of git_upload_pack is much less because this method is only used when data is sent to the client, this method also does a database write that updates projects statistics.

Based on json.duration_s in logs, the calls to rails for git https takes significantly longer https://log.gprd.gitlab.net/goto/cbcce4bd44c332ffc584d18a0d650b30

It's clear that git-ssh is not putting as much load on the puma service compared to git-https. Each of these services is making two calls to puma for a git fetch.

The latency numbers below are for 50th percentiles along with the number of requests/second on production.

  • Git SSH

    • Authorized keys check: ~8ms
    • internal/allowed: ~45ms
  • Git HTTPs

    • Info-refs: 80-90ms
    • Upload-pack: 60ms

One thing we were considering is flipping the feature flag to update project statistics:

https://gitlab.com/gitlab-org/gitlab/blob/83e89f4fdc7eccb7d4f9c0acadcd7c6eca4ff9cf/app/controllers/repositories/git_http_controller.rb#L83

though it's not clear whether this will have a big impact, since the database writes are happening for git_upload_pack which is small part of traffic. info_refs which does not write to the db based on logs, is using approximately the same amount of time https://log.gprd.gitlab.net/goto/ceceb78f030631eb0363e921c28e20f7