Race condition when creating multiple prometheus alerts
We are seeing issues when creating multiple prometheus alerts in rapid succession.
The helm upgrade is failing with Kubernetes error: object is being deleted: pods "install-prometheus" already exists
Reproduce:
- Checkout !20585 (closed)
- Have a chart with multiple metrics
- Create alerts for multiple metrics
- The cluster application gets into
update_errored
status. (Note: We don't handle this state in the UI. See #196231) - This occurs usually 100% after a fresh prometheus install. To reproduce otherwise, try clearing
last_update_started_at
Another way to reproduce this is to run the following in a rails console:
10.times { ::Clusters::Applications::ScheduleUpdateService.new(application, project).execute }
where application
is an instance of Clusters::Applications::Prometheus
and project
is the project that the cluster belongs to. Most of the time, running this will cause the application to go into the update_errored
state.
Edited by Reuben Pereira