MonitorLockedTables doesn't report correct information anymore
GitLab.com has been running with multiple databases main and ci for almost 3 years, and recently we introduced a new sec database. All of these 3 database still have the same schema. To avoid that the app writes to the wrong database, for example users on ci or sec, or ci_pipelines on main or sec, we locked those tables. We refer to them as Legacy tables, because they are left-overs from the 1 database architecture.
To know more about locking tables, see here
To make sure we always have the legacy tables locked, we have regular cron job MonitorLockedTables that runs every 3 days. See details here
MonitorLockedTables runs by calling TablesLocker in dry_run mode. TablesLocker calls LocksWritesManager.
Recently we got false-alarm that many tables need to be locked or unlocked. But that's because we removed the check for table or trigger existence in this MR !188276 (diffs). The goal was to make it faster. No incidents happened.
Therefore, we disabled the lock_tables_in_monitoring on both Staging and Production until we have this resolved:
- https://gitlab.com/gitlab-com/gl-infra/feature-flag-log/-/issues/43099
- https://gitlab.com/gitlab-com/gl-infra/feature-flag-log/-/issues/43100
Suggested corrective action:
- Bring back the old checks of whether a table or trigger exist in
LockWritesManager. By reverting !188276 (diffs), but make them skipped by introducing aforcemode that makes this operation faster and skips the checks. - The rake tasks
gitlab:db:lock_writesandgitlab:db:unlock_writesshould eventually call theLockWritesManagerinforcemode. ButMonitorLockedTablesshould passforceset tofalse. - Once we get correct logs in Kibana that no tables need to be locked or unlocked as expected we can re-enable the feature flag
lock_tables_in_monitoringon bothstagingandproduction.