Skip to content
  • Zeger-Jan van de Weg's avatar
    34f435ae
    Use Go-Git for the FindCommit RPC · 34f435ae
    Zeger-Jan van de Weg authored
    FindCommit is called very often, to an extend that it's a problem to
    have all these requests go through Gitaly at the moment. This made the
    Gitaly team invest a lot of time in clientside N + 1 problems. Eventhough
    this was fruitful, the optimalisations weren't enough to bring the number
    of RPC/s down to a level where the RPC could be called 100% of the time.
    
    The current way of obtaining the commit information is by shelling out
    to the git binary, using `git log -z` with extensive use of format
    options. Shelling out comes at a runtime cost, and by using a native
    Golang implementation of git this cost could be avoided. The parent
    commit introduced src-d/go-git as a dependency.
    
    The intent is to swap out the git implemenation without the need for any
    proto, or client-side changes, and also be fully compatible with the
    shelling out. Things to check, before these commits can be merged to
    master include:
    1. Shelling out includes `GIT_OBJECT_DIRECTORY` and
    `GIT_ALTERNATE_OBJECT_DIRECTORY`, and sets the values in the execution
    environment. To what extend FindCommit requires these values, and how to
    set these values when using go-git are unanswered questions at the
    moment.
    2. The full test suite of GitLab-CE and GitLab-EE should be able to pass
    with mininal changes to those codebases.
    
    The main reason to swap out implemenations is performance, so
    gitaly-bench was updated to be able benchmark the FindCommit RPC in:
    gitlab-org/gitaly-bench!5
    
    Both Gitaly's were started with the same configuration, apart from the
    port. The shell out implemenation was listening on :9999, the go-git
    implementation on :19999. Output is truncated, but the commands are not
    for reproducibilty.
    
    ```
    $ go version
    go version go1.10.1 darwin/amd64
    
    $ go run gitaly-bench.go -iterations 100 -repo
    gitlab-org/gitlab-test.git -host tcp://localhost:9999 find-commit
    Stats:
     Average: 0.000000
     Total requests: 1000
     Elapsed Time (sec): 17.3454
     Average QPS: 57.65
     Errors: 0
     Percent errors: 0.00
    
    $ go run gitaly-bench.go -iterations 100 -repo
    gitlab-org/gitlab-test.git -host tcp://localhost:19999 find-commit
    Stats:
     Average: 0.000000
     Total requests: 1000
     Elapsed Time (sec): 0.9546
     Average QPS: 1047.55
     Errors: 0
     Percent errors: 0.00
    
    $ go run gitaly-bench.go -iterations 100 -repo
    gitlab-org/gitlab-test.git -host tcp://localhost:9999 find-commit
    -revision "4a24d82dbca5c11c61556f3b35ca472b7463187e"
    Stats:
     Average: 0.000000
     Total requests: 1000
     Elapsed Time (sec): 17.7700
     Average QPS: 56.27
     Errors: 0
     Percent errors: 0.00
    
    $ go run gitaly-bench.go -iterations 100 -repo
    gitlab-org/gitlab-test.git -host tcp://localhost:19999 find-commit
    -revision "4a24d82dbca5c11c61556f3b35ca472b7463187e"
    Stats:
     Average: 0.000000
     Total requests: 1000
     Elapsed Time (sec): 1.3640
     Average QPS: 733.12
     Errors: 0
     Percent errors: 0.00
    
    $ go run gitaly-bench.go -iterations 100 -repo
    gitlab-org/gitlab-test.git -host tcp://localhost:9999 find-commit
    -revision "HEAD~25"
    Stats:
     Average: 0.000000
     Total requests: 1000
     Elapsed Time (sec): 17.5492
     Average QPS: 56.98
     Errors: 0
     Percent errors: 0.00
    
    $ go run gitaly-bench.go -iterations 100 -repo
    gitlab-org/gitlab-test.git -host tcp://localhost:19999 find-commit
    -revision "HEAD~25"
    Stats:
     Average: 0.000000
     Total requests: 1000
     Elapsed Time (sec): 3.2684
     Average QPS: 305.96
     Errors: 0
     Percent errors: 0.00
    
    $ go run gitaly-bench.go -iterations 100 -repo
    gitlab-org/gitlab-test.git -host tcp://localhost:9999 find-commit
    -revision "feature_conflict"
    Stats:
     Average: 0.000000
     Total requests: 1000
     Elapsed Time (sec): 18.8795
     Average QPS: 52.97
     Errors: 0
     Percent errors: 0.00
    
    $ go run gitaly-bench.go -iterations 100 -repo
    gitlab-org/gitlab-test.git -host tcp://localhost:19999 find-commit
    -revision "feature_conflict"
    Stats:
     Average: 0.000000
     Total requests: 1000
     Elapsed Time (sec): 1.4138
     Average QPS: 707.33
     Errors: 0
     Percent errors: 0.00
    ```
    
    Data shows that across the board the go-git implementation is faster
    than shelling out to the git binary. The order of magnitude faster
    varies strongly however. From being 18 times faster, to 'just' 5 times
    faster. The difference can be explained by the type of revision that is
    passed as argument to the RPC. The relative revision, HEAD~25, is
    slowest, my asumption being that the implemenation of walking the
    history is not optimal. The shelling out implemenation is highly
    consistent in its timings.
    
    This change has one notable side effect; logging is greatly reduced,
    as shelling out is limited. The internal wrappers around shelling out
    log heavily, improving visibilty.
    34f435ae
    Use Go-Git for the FindCommit RPC
    Zeger-Jan van de Weg authored
    FindCommit is called very often, to an extend that it's a problem to
    have all these requests go through Gitaly at the moment. This made the
    Gitaly team invest a lot of time in clientside N + 1 problems. Eventhough
    this was fruitful, the optimalisations weren't enough to bring the number
    of RPC/s down to a level where the RPC could be called 100% of the time.
    
    The current way of obtaining the commit information is by shelling out
    to the git binary, using `git log -z` with extensive use of format
    options. Shelling out comes at a runtime cost, and by using a native
    Golang implementation of git this cost could be avoided. The parent
    commit introduced src-d/go-git as a dependency.
    
    The intent is to swap out the git implemenation without the need for any
    proto, or client-side changes, and also be fully compatible with the
    shelling out. Things to check, before these commits can be merged to
    master include:
    1. Shelling out includes `GIT_OBJECT_DIRECTORY` and
    `GIT_ALTERNATE_OBJECT_DIRECTORY`, and sets the values in the execution
    environment. To what extend FindCommit requires these values, and how to
    set these values when using go-git are unanswered questions at the
    moment.
    2. The full test suite of GitLab-CE and GitLab-EE should be able to pass
    with mininal changes to those codebases.
    
    The main reason to swap out implemenations is performance, so
    gitaly-bench was updated to be able benchmark the FindCommit RPC in:
    gitlab-org/gitaly-bench!5
    
    Both Gitaly's were started with the same configuration, apart from the
    port. The shell out implemenation was listening on :9999, the go-git
    implementation on :19999. Output is truncated, but the commands are not
    for reproducibilty.
    
    ```
    $ go version
    go version go1.10.1 darwin/amd64
    
    $ go run gitaly-bench.go -iterations 100 -repo
    gitlab-org/gitlab-test.git -host tcp://localhost:9999 find-commit
    Stats:
     Average: 0.000000
     Total requests: 1000
     Elapsed Time (sec): 17.3454
     Average QPS: 57.65
     Errors: 0
     Percent errors: 0.00
    
    $ go run gitaly-bench.go -iterations 100 -repo
    gitlab-org/gitlab-test.git -host tcp://localhost:19999 find-commit
    Stats:
     Average: 0.000000
     Total requests: 1000
     Elapsed Time (sec): 0.9546
     Average QPS: 1047.55
     Errors: 0
     Percent errors: 0.00
    
    $ go run gitaly-bench.go -iterations 100 -repo
    gitlab-org/gitlab-test.git -host tcp://localhost:9999 find-commit
    -revision "4a24d82dbca5c11c61556f3b35ca472b7463187e"
    Stats:
     Average: 0.000000
     Total requests: 1000
     Elapsed Time (sec): 17.7700
     Average QPS: 56.27
     Errors: 0
     Percent errors: 0.00
    
    $ go run gitaly-bench.go -iterations 100 -repo
    gitlab-org/gitlab-test.git -host tcp://localhost:19999 find-commit
    -revision "4a24d82dbca5c11c61556f3b35ca472b7463187e"
    Stats:
     Average: 0.000000
     Total requests: 1000
     Elapsed Time (sec): 1.3640
     Average QPS: 733.12
     Errors: 0
     Percent errors: 0.00
    
    $ go run gitaly-bench.go -iterations 100 -repo
    gitlab-org/gitlab-test.git -host tcp://localhost:9999 find-commit
    -revision "HEAD~25"
    Stats:
     Average: 0.000000
     Total requests: 1000
     Elapsed Time (sec): 17.5492
     Average QPS: 56.98
     Errors: 0
     Percent errors: 0.00
    
    $ go run gitaly-bench.go -iterations 100 -repo
    gitlab-org/gitlab-test.git -host tcp://localhost:19999 find-commit
    -revision "HEAD~25"
    Stats:
     Average: 0.000000
     Total requests: 1000
     Elapsed Time (sec): 3.2684
     Average QPS: 305.96
     Errors: 0
     Percent errors: 0.00
    
    $ go run gitaly-bench.go -iterations 100 -repo
    gitlab-org/gitlab-test.git -host tcp://localhost:9999 find-commit
    -revision "feature_conflict"
    Stats:
     Average: 0.000000
     Total requests: 1000
     Elapsed Time (sec): 18.8795
     Average QPS: 52.97
     Errors: 0
     Percent errors: 0.00
    
    $ go run gitaly-bench.go -iterations 100 -repo
    gitlab-org/gitlab-test.git -host tcp://localhost:19999 find-commit
    -revision "feature_conflict"
    Stats:
     Average: 0.000000
     Total requests: 1000
     Elapsed Time (sec): 1.4138
     Average QPS: 707.33
     Errors: 0
     Percent errors: 0.00
    ```
    
    Data shows that across the board the go-git implementation is faster
    than shelling out to the git binary. The order of magnitude faster
    varies strongly however. From being 18 times faster, to 'just' 5 times
    faster. The difference can be explained by the type of revision that is
    passed as argument to the RPC. The relative revision, HEAD~25, is
    slowest, my asumption being that the implemenation of walking the
    history is not optimal. The shelling out implemenation is highly
    consistent in its timings.
    
    This change has one notable side effect; logging is greatly reduced,
    as shelling out is limited. The internal wrappers around shelling out
    log heavily, improving visibilty.
Loading