Teach gitlab-workhorse to serve requests to get raw blobs
At Nexedi we do all our services via SlapOS and that means a lot of software is rebuilt constantly on a lot of machines. A software to build is defined just by URL and thus a lot of requests go to Git server constantly to get raw blob for software, e.g. like this:
https://lab.nexedi.com/nexedi/slapos/raw/master/software/wendelin/software.cfg
Currently GitLab serves requests to get raw blobs via Ruby-on-Rails code and Unicorn. Because RoR/Unicorn is relatively heavyweight, in environment where there are a lot of simultaneous requests to get raw blobs, this works very slow and server is constantly overloaded.
On the other hand, to get raw blob content, we do not need anything from RoR framework - we only need to have access to project git repository on filesystem, and knowing whether access for getting data from there should be granted or not. That means it is possible to adjust Nginx frontend to route '.../raw/....' request to more lightweight and performant program which does this particular task and that will be a net win.
As gitlab-workhorse is written in Go, and Go has good concurrency/parallelism support and is generally much faster than Ruby, adding raw blob serving task to it makes sense.
Please find patches to do that attached.
The performance changes as follows:
(on a 8-CPU i7-3770S with 16GB of RAM, 2001:67c:1254:e:89::fa34 is on localhost)
# without patch: request eventually goes to unicorn (9 unicorn workers)
$ ./wrk -c40 -d10 -t1 --latency http://[2001:67c:1254:e:89::fa34]:7777/root/slapos/raw/master/software/wendelin/software.cfg
Running 10s test @ http://[2001:67c:1254:e:89::fa34]:7777/root/slapos/raw/master/software/wendelin/software.cfg
1 threads and 40 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 609.34ms 156.92ms 1.18s 79.60%
Req/Sec 64.22 19.90 120.00 67.00%
Latency Distribution
50% 596.50ms
75% 678.23ms
90% 805.72ms
99% 1.04s
642 requests in 10.01s, 1.24MB read
Requests/sec: 64.16
Transfer/sec: 127.00KB
# request goes to gitlab-workhorse with auth caching
$ ./wrk -c40 -d10 -t1 --latency http://[2001:67c:1254:e:89::fa34]:7777/root/slapos/raw/master/software/wendelin/software.cfg
Running 10s test @ http://[2001:67c:1254:e:89::fa34]:7777/root/slapos/raw/master/software/wendelin/software.cfg
1 threads and 40 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 36.62ms 25.39ms 351.14ms 72.02%
Req/Sec 1.16k 93.73 1.36k 77.00%
Latency Distribution
50% 36.30ms
75% 47.02ms
90% 66.36ms
99% 122.46ms
11580 requests in 10.01s, 20.56MB read
Requests/sec: 1156.85
Transfer/sec: 2.05MB
In other words it is ~ 17x improvement.
Thanks beforehand,
Kirill