Fix Broadcaster goroutine leak causing 35-minute test timeout
Problem
Tests that call handlers.NewApp() or newTestEnv() never shut down the Broadcaster created by configureEvents(). Each leaked Broadcaster.run() goroutine blocks on a select{} for the lifetime of the test binary. These accumulate across tests and eventually cause the 35-minute CI timeout.
The flaky test name (database-api-load-balancing-discovery-pg_curr_version-1_25-valkey::TestMain) is misleading. The timeout fires while unrelated tests are running because the goroutine leak is cumulative across the entire test binary.
Root cause
-
NewAppdid not clean up the event sink if an error occurred afterconfigureEvents(). - Integration tests that create an
App(viaNewAppornewTestEnv) did not callGracefulShutdownon cleanup.
Fix
-
Close Broadcaster on
NewApperror — add a deferred cleanup inNewAppso that if any step afterconfigureEvents()fails,GracefulShutdowncloses the event sink. -
Refactor
NewApperror cleanup — replace theeventsSinkConfiguredbool flag with a named error return and a defer that callsGracefulShutdownon failure, avoiding duplicated sink-close logic. -
Add
GracefulShutdownto test cleanup — allapp_integration_test.gotests that successfully create anAppnow registerGracefulShutdownviat.Cleanup. -
Add missing
Cleanupcalls — several tests inapi_integration_test.gocreate a secondnewTestEnv(typically with auth options) without callingCleanup, leaking aBroadcaster. These now callenv.Cleanup(t). -
Use
skipDatabaseNotEnabledhelper — lockfile tests replaced manualREGISTRY_DATABASE_ENABLEDenv var checks with the existing helper for consistency.
Edited by Hayley Swimelar