Introduce Info-Ref cache features to production
The info-ref cache (#1631 (closed)) is a delicate feature that requires caution when being deployed. The two major components to be turned on, the cache invalidator and the info-ref caching, will require coordination in how they are turned on and validated.
Prometheus Counters
The following counters will be viewable on the dedicated dashboard for Gitaly InfoRef cache.
- Disk cache - shows us effectiveness of the disk cache. Similar to the info-ref caching counters, but generic for across all of Gitaly.
-
gitaly_diskcache_requests_total
- gitaly_diskcache_miss_total
-
gitaly_diskcache_bytes_stored_total
- actual byte count for data stored by disk cache -
gitaly_diskcache_bytes_fetched_total
- actual byte count for data returned by disk cache -
gitaly_diskcache_errors_total
- what types of errors is the diskcache encountering?-
ErrMissingLeaseFile
- see discussion !1305 (comment 188930204) -
ErrPendingExists
- see discussion !1305 (comment 186569550)
-
- Object remover
-
gitaly_diskcache_walker_check_total
- how many files the walker inspects -
gitaly_diskcache_walker_removal_total
- how many files the walker actually successfully removes
-
-
- Cache invalidation -
-
gitaly_cacheinvalidator_rpc_total
- aggregation of all labels -
gitaly_cacheinvalidator_optype_total
- includes the following labels for RPC operation types encountered:accessor
mutator
-
unknown
- we do not expect to see any RPC's with unknown operation type
-
gitaly_cacheinvalidator_error_total
- unexpected errors during invalidation. We hope this number will be zero.
-
- Info-ref caching
-
gitaly_inforef_cache_attempt_total
- how many times the cache was attempted during an InfoRefUploadPack RPC call -
gitaly_inforef_cache_hit_miss_total
- how the cache attempt ended. Includes the following labels:-
hit
- the entry was in the cache -
miss
- the entry was not in the cache -
err
- an unexpected error (we hope this will be zero)
-
-
Feature Flags
-
cache_invalidator
- enables the cache invalidator -
inforef_uploadpack_cache
- enables the SmartHTTP service InfoRefUploadPack caching (blocked until merged: !1366 (merged))
Deployment Plan
Cache Invalidator
-
Manually test in GDK -
Verify feature flag gitaly_cache_invalidator
works- In rails console:
Feature.enable(:gitaly_cache_invalidator)
- In rails console:
-
Verify via local prometheus instance that counters increment for cache invalidation and disk cache
-
-
Turn on the cache invalidator feature flag in staging via ChatOps command in Slack: /chatops run feature set gitaly_cache_invalidator 100 --staging
.- Watch the prometheus counters for the following errors:
-
We should see no unknown RPC's -
We should see no unexpected errors (except maybe /grpc.health.v1.Health/Check
) -
We should see no significant degradation of service
-
- Watch the prometheus counters for the following errors:
-
Once we are happy with results in staging, we should gradually deploy to gitlab.com. Gradually ramp up from 1%-5%-10%-25%-50%-100% of requests. - Between each ramp up, check the following dashboards:
- We should see no unknown RPC's:
- We should see no unexpected errors:
- We should see no significant degradation of service:
- Between each ramp up, check the following dashboards:
- Once we are confident that the cache invalidator is not causing any service degradations in prod, we can remove the feature flag from both the stream interceptor and the unary interceptor.
InfoRef Cache
Blocked by #1764 (closed)
- Manually test in GDK
-
Verify cached response is created when fetching info ref pack curl http://root:5iveL\!fe@localhost:3000/gitlab-org/gitlab-test.git/info/refs?service=git-upload-pack
- Look for new file in cache directory for storage location
- Verify the newest file contains the same contents as the received response
-
Edited by Paul Okstad (ex-GitLab)