ScheduleSettingChangedUpdateWorker performance issue
The newly introduced ScheduleSettingChangedUpdateWorker
is exhibiting performance issues, as shown in these logs. These issues are suspected to be contributing to the inc-2131-primary-db-saturation-causing-sidekiq-backlogging
incident in production.
To mitigate this, we should:
-
Reduce the input size for
ScheduleSettingChangedUpdateWorker
. Currently, it processes all project ids fromSetGroupSecretPushProtectionService
, which can include thousands of entries. In most cases, a much smaller subset is likely sufficient. - Add a scheduling delay to stagger execution. This may help avoid repeated updates to the same namespace-level records, particularly for top-level groups, when counter propagation takes place.
-
Use
defer_on_database_health_signal
to defer execution on db health signals. This could help us prevent overloading the DB with additional stress. - Wrap the worker logic in a dedicated feature flag to allow gradually bringing this worker back while monitoring its performance impact.
Edited by Gal Katz