Investigate usage of Redis from CI
Look for scenarios where we might be storing too much data in Redis. An important lesson to learn from the JUnit one is that it seemed like a reasonable usage of Redis until we encountered the monstrous use cases that some users were encountering when the diff was 90mb
Related issues
https://gitlab.com/gitlab-org/gitlab-ce/issues/64035 - Do not cache huge junit test artifacts in Redis cache
gitlab-com/gl-infra/production#928 (closed) - Degraded performance on GitLab.com
Progresses
Start by checking the usage of ReactiveCaching
across CI and the rest of the application.
$ ack with_reactive_cache app lib
CI specific usage
-
app/models/project_services/teamcity_service.rb - SAFE - stores build status, minimal data usage -
app/models/project_services/drone_ci_service.rb - SAFE- stores build status, minimal data usage -
app/models/project_services/buildkite_service.rb - SAFE - stores build status, minimal data usage -
app/models/project_services/bamboo_service.rb - SAFE - stores build status, minimal data usage -
app/models/merge_request.rb - MONITOR - usage was reduced but need to verify if it's acceptable now
Everything else
-
app/models/environment.rb - TO CHECK - caches a list of Kubernetes pods. It depends on how big it can be. -
app/models/error_tracking/project_error_tracking_setting.rb - SAFE - Stores max 20 Sentry issues) -
app/models/clusters/cluster.rb - SAFE - Stores only a symbol for the current status -
app/models/ssh_host_key.rb - SAFE - Stores list of known ssh hosts -
app/models/concerns/prometheus_adapter.rb - TO CHECK - Can store data for any Prometheus metrics. We should check all *Query classes -
app/finders/clusters/knative_services_finder.rb - TO CHECK - it depends on the number of pods that the request can return -
app/services/prometheus/proxy_service.rb - TO CHECK - we cache HTTP response body from Prometheus. Size of the data might not be predictable
cc @shampton