Add parallel cache download with FF_USE_PARALLEL_CACHE_DOWNLOAD feature flag

What does this MR do?

Addresses #4643

This MR adds parallel HTTP Range-based cache downloads to cache-extractor, gated behind FF_USE_PARALLEL_CACHE_DOWNLOAD.

When enabled, cache archives are downloaded using multiple concurrent HTTP Range requests instead of a single GET stream. This uses the same pre-signed URL already generated so the MR requires no credential changes, no new SDK dependencies, and most importantly works with all storage backends!

Falls back to single-stream if the server does not support Range requests or the file is smaller than 10 MB. Zero behavior change when the FF is off.

Concurrency is configurable via --download-concurrency flag or CACHE_DOWNLOAD_CONCURRENCY env var (default: 8).

Why was this MR needed?

When using S3 as cache backend on 100 Gbit/s multi-core machines for dependencies caches (Maven, Gradle, node...) S3 throttles single-stream GET throughput to ~80-100 MB/s per connection. (at least I've seen those numbers + someone allegedly from AWS team confirmed it a few years ago: https://news.ycombinator.com/item?id=26766607)

The current code is relatively simplistic with simple http.Get() so it doesn't utilise this available network bandwidth at all. (and leaves developers waiting for cache download...)

From a comment by @ajwalker !5246 (merged) I got some inspiration for this change, as his comment was definitely true:

I think pretty much all cloud storage providers support range requests, so it's likely we could come up with a solution that works with any signed URL.

I think this is the cleanest attempt to solve this at least for downloads. In my use case we anyway pull cache in every MR and push only in protected branch.

This MR solves it at the basic HTTP layer instead using Range requests against the existing pre-signed URL.

Benchmark results

2.15 GB cache archive downloaded from S3 (eu-central-1) from bare metal machine in NL. Probably could get faster from Frankfurt <> eu-central-1 S3.

10 Gbps NIC host:

Method Time Throughput Speedup
Single stream 28.3s 86.8 MB/s 1.0x
Parallel (8 streams) 7.4s 556.8 MB/s 3.8x

100 Gbps NIC:

Method Time Peak Avg Speedup
Single stream 24.0s 103.1 MB/s 100.4 MB/s 1.0x
Parallel (8 streams) 6.5s 772.6 MB/s 555.0 MB/s 3.7x
Parallel (32 streams) 7.0s 922.6 MB/s 515.0 MB/s 3.4x

My runs:

URL="https://my-2gb-zip-file.s3.eu-central-1.amazonaws.com/cache.zip"

# Baseline: single stream
rm -f /tmp/bench-cache.zip
time ( gitlab-runner-test cache-extractor \
      --url "$URL" \
      --transfer-meter-frequency="1s" \
      --file /tmp/bench-cache.zip 2>&1)

# Parallel: 8 streams
rm -f /tmp/bench-cache.zip
time ( FF_USE_PARALLEL_CACHE_DOWNLOAD=true \
CACHE_DOWNLOAD_CONCURRENCY=32 \
    gitlab-runner-test cache-extractor \
      --url "$URL" \
      --transfer-meter-frequency="1s" \
      --file /tmp/bench-cache.zip 2>&1)

Single stream download looks to be locked at exactly ~100 MB/s which seems to be the per-connection S3 throttle nowadays. Parallel streams use bandwidth better but the 2.15 GB test file finishes too quickly for 32 streams to show sustained improvement over 8 (or to come anywhere to my bandwidth saturation, but this is physics of a relatively small file).

I can also try with larger files, but these 2 GBs are realistic for my workloads and 5GB is gitlab-runner cache limit anyway. (I can also try on EC2 in eu-central-1 to show maximum possible speed increase possible)

Summary

I simply wanted to change this with minimal changes and with zero concerns to security or being AWS-only 😃 I hope the MR gets accepted quickly! cc @ajwalker @stanhu

Edited by Emir Beganović

Merge request reports

Loading