Handling locked partitions when truncating legacy tables
What does this MR do and why?
Addressing the issue: #386759 (closed)
Explaining the problem briefly
We have two rake tasks that can be used to truncate legacy tables. These two rake tasks were initially introduced in this MR
rake gitlab:db:truncate_legacy_tables:mainrake gitlab:db:truncate_legacy_tables:ci
When we ran the rake tasks on GSTG and GPRD to truncate the legacy tables (gitlab_ci tables on main database and gitlab_main tables on ci database) we found a problem that was first discovered on Staging.
Some attached partition security_findings_1 happened to be locked for writes as well, and couldn't be truncated. This is because this table partition used to be a table and then converted into a partition. But the lock-writes triggered were kept on the table on the ci database.
Why did this problem happen?
When we truncate legacy tables, we disable the lock-writes triggers temporarily (within the current transaction) on the partitioned tables, but not on the partitions themselves.
How does this MR solve the problem?
With this MR, we are also disabling the lock-writes triggers temporarily on the table partitions as well.
Why do we lock the writes on the legacy tables anyway?
As part of the database decomposition project we made a copy of the database on a new cluster. We used the first database cluster, called main for all the database tables that belong to the gitlab_main gitlab schema. And used the new database cluster ci for all the tables that belong to the gitlab_schema gitlab_ci. While the gitlab_shared tables are still used on both clusters
After the decomposition we use gitlab_main and gitlab_shared tables on the database cluster main. And we use the gitlab_ci and gitlab_shared tables on the database cluster ci. But the database schema (not to be confused with gitlab_schema) is still the same on both database clusters. We are keeping it now like this for simplicity. That means we still have some legacy tables, that we have been truncated recently, that the application in theory still can write/read from, but it should not. These tables should be ignored. We decided to lock_writes to them, to detect with errors early on that the application is writing to the wrong database.
For example, the application should not write to the users table on the ci database, even though it exists. And the same applies to the ci_builds table on the main database. We should get an error instead.
How to set up and validate locally
Prerequisites
- Make sure you are running gitlab locally in decomposed mode: #354065 (closed)
- Make sure you have the legacy tables locked for writes by running
rake gitlab:db:lock_writes
Instructions
- Create a
gitlab_maintable partition on thecidatabase by running:
$ gdk psql -d gitlabhq_development_ci -c 'CREATE TABLE gitlab_partitions_dynamic.security_findings_test_partition PARTITION OF security_findings FOR VALUES IN (50000)'
- Lock the new partition for writes. We don't usually lock partitions, but we need to simulate a write-locked partition (see the MR description)
$ ./bin/rails runner Gitlab::Database::LockWritesManager.new(table_name: "gitlab_partitions_dynamic.security_findings_test_partition", connection: Ci::ApplicationRecord.connection, database_name: "ci", with_retries: false).lock_writes'
- Truncate all the legacy main tables on the
cidatabase. It might take a while, but it should not error.
$ rake 'gitlab:db:truncate_legacy_tables:ci[100]'
Clean up
$ rake gitlab:db:unlock_writes
$ gdk psql -d gitlabhq_development_ci -c 'drop table gitlab_partitions_dynamic.security_findings_test_partition'
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #386759 (closed)