Geo - Error during verification: Attempted to touch a stale object: Terraform::StateVersion

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

During a recent GitLab Dedicated migration, these verification errors were found on a number of Terraform::StateVersion records - https://gitlab.com/gitlab-com/gl-infra/gitlab-dedicated/team/-/issues/8349#note_2447425597

The number of records failing with this error tended to fluctuate ((391->157->164 etc.), with the current assumption being this is caused by increased user/org activity - see - https://gitlab.com/gitlab-com/gl-infra/gitlab-dedicated/team/-/issues/8349#note_2447647819

It looks like this is an ActiveRecord error - where the record will be loaded in a process, and then before it gets updated in said process, it gets updated elsewhere and now the initially loaded object is stale/out of date, so the update fails due to optimistic locking.

Given I can't find further Geo related info on this - I'm inclined to think this may be another case of heavy activity and these will hopefully sort themselves out when activity slows. If we can check what happens to the number over the weekend and what they look like on Monday morning, that will give us an indication whether that theory checks out or not.

It appears that multiple processes are trying to update one of these records at the same time so we need to track down where this might be happening.

Additional note

There is ongoing work to move the Geo state fields in this table out into their own state table (like the majority of Geo state tables) - so this may have a positive impact on this error if it's related to many things trying to update the record at once.

To do

  • Loop in the owners of Terraform State domain
Edited by 🤖 GitLab Bot 🤖