Investigate ContainerRegistryPrimaryDatabaseCPUSaturation alerts firing every other day

Alert Details

The ContainerRegistryPrimaryDatabaseCPUSaturation alert is firing every other day in production (gprd), indicating CPU pressure on the patroni-registry primary database node is over three standard deviations above average.

Timeline

Investigation Notes

The previous incident raised a potential connection with storage usage calculation queries, which are known to be slow and timeout for large namespaces. These queries should now be routed to replicas (not primary), so the impact should be independent. However, this needs verification.

Next Steps

  1. Analyze query patterns on primary vs replicas to identify what's causing CPU saturation
  2. Verify that storage usage calculation queries are properly routed to replicas
  3. Determine if the new replica addition correlates with the alert frequency
  4. Consider if additional tuning or query optimization is needed