cache: shard S3 object paths by first two chars of hash when FF_HASH_CACHE_KEYS is enabled
What does this MR do?
When FF_HASH_CACHE_KEYS is enabled, all cache objects for a given project
share the same S3 object prefix (e.g. runner/<token>/project/<id>/). S3
partitions throughput limits by prefix, so with many parallel jobs this can
cause 503 Slow Down responses.
This MR inserts the first two hex characters of the SHA-256 cache key as a shard prefix in the distributed cache object path:
[path/][runner/<token>/]project/<id>/<shard>/<hash>/cache.zipFor example:
runner/abc123/project/42/d0/d03a852ba491ba611e907b1ef60ad5c4516a05b8f3aae6abb77f42bc60325aed/cache.zipThis spreads objects across 256 distinct prefixes per project, eliminating the single hot-partition problem described in the AWS S3 performance guidelines.
GetAdapter gains a sharded bool parameter; callers pass
build.IsFeatureFlagOn(featureflags.HashCacheKeys) so the sharding is tied
exactly to when keys are guaranteed to be 64-char hex strings — no runtime
detection needed.
Why was this MR needed?
High-throughput GitLab Runner deployments with S3 caching and FF_HASH_CACHE_KEYS
enabled can hit S3's per-prefix request rate limits (3,500 PUT/s, 5,500 GET/s),
resulting in 503 Slow Down responses. Since all cache objects for a project share
the same prefix, a single busy project is enough to trigger throttling.
Breaking change
This is a breaking path change for users already running with
FF_HASH_CACHE_KEYS=true. The shard prefix changes the object path for all
cache artifacts in distributed storage. Existing objects stored at the old path
(.../<hash>/cache.zip) become unreachable after upgrading. Expect cache misses
and a cold-cache rebuild on the first job run after upgrading to this version.
What's the best way to test this MR?
- Run
go test ./cache/... -run TestGenerateObjectNameto verify the sharded path structure. - Enable
FF_HASH_CACHE_KEYS=trueon a runner with S3 caching and confirm that new cache objects land under the<shard>/<hash>path.