Use optimistic locking when updating Terraform state
What does this MR do and why?
Swaps from pessimistic locking to optimistic locking when accessing Terraform state.
There are three parts to this change:
- Adding a lock version column to
terraform_states
to support Rails optimistic locking (see docs linked above). Whenever a record is updated, the lock version is incremented automatically by Rails. If an update is attempted but the lock version has changed since the record was loaded, it means the update should not proceed and aStaleObjectError
is raised. - Adding
touch: true
to thebelongs_to
association fromterraform_state_versions
toterraform_states
. This means theupdated_at
of the parentterraform_state
record is updated whenever a new version is created, which is beneficial for two reasons:- UIs that show the "last updated at" of a Terraform state will now show the correct timestamp, and, more importantly,
- This update can be used to trigger the optimistic locking flow when updating a Terraform state. Previously this was not possible, as creating a child
terraform_state_version
did not modify the parent record in any way.
- Enable GitLab's
OptimisticLocking
wrapper whenever a Terraform state record is accessed. This rescues theStaleObjectError
raised when a record has conflicting updates, and retries the update after reloading the record.
There are two main benefits to locking this way:
- An exclusive lock is not required for readonly or no-op actions. For example, fetching an existing state without modifying it, or attempting to modify a state without permission. This should greatly increase the throughput of these endpoints, as these actions (on a single state) can now be served concurrently.
- The record is not locked, and therefore a database transaction is not open, while the Terraform state is pushed to object storage. I'm not aware of any existing problems related to this, but short transactions that don't depend on external services are a good idea in general.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
How to set up and validate locally
Basic test
- Install Terraform with
brew install terraform
- Create a basic Terraform project with the following
main.tf
:terraform { backend "http" { } } resource "local_file" "test" { count = 10 content = timestamp() filename = "${path.module}/${count.index}.txt" }
- Initialise a Terraform state:
terraform init \ -backend-config="address=http://127.0.0.1:3000/api/v4/projects/49/terraform/state/example-state" \ -backend-config="lock_address=http://127.0.0.1:3000/api/v4/projects/49/terraform/state/example-state/lock" \ -backend-config="unlock_address=http://127.0.0.1:3000/api/v4/projects/49/terraform/state/example-state/lock" \ -backend-config="username=root" \ -backend-config="password=$GITLAB_ACCESS_TOKEN" \ -backend-config="lock_method=POST" \ -backend-config="unlock_method=DELETE" \ -backend-config="retry_wait_min=5
- Apply the changes:
terraform apply --auto-approve
- Observe 10 files are created, each containing a timestamp.
- In the Rails console, read the contents from the state to verify it was persisted correctly :
> JSON.parse(Terraform::State.last.latest_version.file.read)
Stress test
- Same as the above, but execute the apply in a loop from multiple terminals:
while; do terraform apply --auto-approve; done
- In your
gitlab/log/service_measurement.log
, you should start to see locking conflict messages such as:... "message":"Optimistic Lock released with retries","name":"Terraform state: 443","retries":1 ...
- However, no errors should be surfaced to the API in
gitlab/log/development.log
, and Terraform shouldn't return any errors (except for the "already locked" error, which is expected).
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #398117
Edited by Tiger Watson