Investigate usage of Redis from CI

Look for scenarios where we might be storing too much data in Redis. An important lesson to learn from the JUnit one is that it seemed like a reasonable usage of Redis until we encountered the monstrous use cases that some users were encountering when the diff was 90mb

Related issues

https://gitlab.com/gitlab-org/gitlab-ce/issues/64035 - Do not cache huge junit test artifacts in Redis cache

gitlab-com/gl-infra/production#928 (closed) - Degraded performance on GitLab.com

Progresses

Start by checking the usage of ReactiveCaching across CI and the rest of the application.

$ ack with_reactive_cache app lib 

CI specific usage

  • app/models/project_services/teamcity_service.rb - SAFE - stores build status, minimal data usage
  • app/models/project_services/drone_ci_service.rb - SAFE- stores build status, minimal data usage
  • app/models/project_services/buildkite_service.rb - SAFE - stores build status, minimal data usage
  • app/models/project_services/bamboo_service.rb - SAFE - stores build status, minimal data usage
  • app/models/merge_request.rb - MONITOR - usage was reduced but need to verify if it's acceptable now

Everything else

  • app/models/environment.rb - TO CHECK - caches a list of Kubernetes pods. It depends on how big it can be.
  • app/models/error_tracking/project_error_tracking_setting.rb - SAFE - Stores max 20 Sentry issues)
  • app/models/clusters/cluster.rb - SAFE - Stores only a symbol for the current status
  • app/models/ssh_host_key.rb - SAFE - Stores list of known ssh hosts
  • app/models/concerns/prometheus_adapter.rb - TO CHECK - Can store data for any Prometheus metrics. We should check all *Query classes
  • app/finders/clusters/knative_services_finder.rb - TO CHECK - it depends on the number of pods that the request can return
  • app/services/prometheus/proxy_service.rb - TO CHECK - we cache HTTP response body from Prometheus. Size of the data might not be predictable

cc @shampton

Edited by Fabio Pitino