Gitlab Terraform HTTP state HTTP Error 422
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
When using Gitlab as a terraform state backend I am seeing the following behavior:
- Terraform locks the state file (success)
- Terraform downloads the state file (success)
- Terraform uploads the modified state file (500 error)
- Terraform uploads the modified state file a second time (422 error)
- Terraform unlocks the state file
Terraform then fails with the following error message:
╷
│ Error: Failed to save state
│
│ Error saving state: HTTP error: 422
╵
╷
│ Error: Failed to persist state to backend
│
│ The error shown above has prevented Terraform from writing the updated
│ state to the configured backend. To allow for recovery, the state has been
│ written to the file "errored.tfstate" in the current working directory.
│
│ Running "terraform apply" again at this point will create a forked state,
│ making it harder to recover.
│
│ To retry writing this state, use the following command:
│ terraform state push errored.tfstate
│
I attempted to figure out what was going on and was looking at logs, but nothing popped out. I could only see the single POST request that would return the 422 error in api_json.log for gitlab-rails logs:
{
"time": "2023-06-23T18:21:22.933Z",
"severity": "INFO",
"duration_s": 16.27093,
"db_duration_s": 0.09819,
"view_duration_s": 16.17274,
"status": 422,
"method": "POST",
"path": "/api/v4/projects/1794/terraform/state/gitlab",
"params": [
{
"key": "ID",
"value": "8f07c995-92a8-996e-21cb-8746c9573704"
},
{
"key": "version",
"value": 4
},
{
"key": "terraform_version",
"value": "1.4.6"
},
{
"key": "serial",
"value": 105
},
{
"key": "lineage",
"value": "aae94282-8ec6-af14-501f-4af2975fe523"
},
{
"key": "truncated",
"value": "..."
}
],
"host": "<REDACTED>",
"remote_ip": "10.144.20.111, 127.0.0.1",
"ua": "Go-http-client/2.0",
"route": "/api/:version/projects/:id/terraform/state/:name",
"user_id": 269,
"username": "<REDACTED>",
"redis_calls": 9,
"redis_duration_s": 0.048744,
"redis_read_bytes": 881,
"redis_write_bytes": 471,
"redis_cache_calls": 4,
"redis_cache_duration_s": 0.046584,
"redis_cache_read_bytes": 865,
"redis_cache_write_bytes": 272,
"redis_shared_state_calls": 5,
"redis_shared_state_duration_s": 0.00216,
"redis_shared_state_read_bytes": 16,
"redis_shared_state_write_bytes": 199,
"db_count": 20,
"db_write_count": 1,
"db_cached_count": 5,
"db_replica_count": 0,
"db_primary_count": 20,
"db_main_count": 20,
"db_main_replica_count": 0,
"db_replica_cached_count": 0,
"db_primary_cached_count": 5,
"db_main_cached_count": 5,
"db_main_replica_cached_count": 0,
"db_replica_wal_count": 0,
"db_primary_wal_count": 0,
"db_main_wal_count": 0,
"db_main_replica_wal_count": 0,
"db_replica_wal_cached_count": 0,
"db_primary_wal_cached_count": 0,
"db_main_wal_cached_count": 0,
"db_main_replica_wal_cached_count": 0,
"db_replica_duration_s": 0,
"db_primary_duration_s": 0.035,
"db_main_duration_s": 0.035,
"db_main_replica_duration_s": 0,
"cpu_s": 51.244262,
"mem_objects": 14062719,
"mem_bytes": 4240207360,
"mem_mallocs": 12437123,
"mem_total_bytes": 4802716120,
"pid": 4031821,
"worker_id": "puma_1",
"rate_limiting_gates": [],
"correlation_id": "01H3MQRTVQ31DS8HWQTDQB5DS4",
"meta.caller_id": "POST /api/:version/projects/:id/terraform/state/:name",
"meta.remote_ip": "10.144.20.111",
"meta.feature_category": "infrastructure_as_code",
"meta.user": "<REDACTED>",
"meta.user_id": 269,
"meta.project": "coreservices/gitlab/configuration",
"meta.root_namespace": "coreservices",
"meta.client_id": "user/269",
"meta.subscription_plan": "default",
"content_length": "284918969",
"request_urgency": "low",
"target_duration_s": 5
}
However the NGINX gitlab_access.log showed the following entries for the above sequence:
10.144.20.111 - gitlab-ci-token [23/Jun/2023:18:12:03 +0000] "POST /api/v4/projects/1794/terraform/state/gitlab/lock HTTP/2.0" 200 170 "" "Go-http-client/2.0" -
10.144.20.111 - gitlab-ci-token [23/Jun/2023:18:12:14 +0000] "GET /api/v4/projects/1794/terraform/state/gitlab HTTP/2.0" 200 284908693 "" "Go-http-client/2.0" -
10.144.20.111 - gitlab-ci-token [23/Jun/2023:18:20:14 +0000] "POST /api/v4/projects/1794/terraform/state/gitlab?ID=8f07c995-92a8-996e-21cb-8746c9573704 HTTP/2.0" 500 3054 "" "Go-http-client/2.0" -
10.144.20.111 - gitlab-ci-token [23/Jun/2023:18:21:24 +0000] "POST /api/v4/projects/1794/terraform/state/gitlab?ID=8f07c995-92a8-996e-21cb-8746c9573704 HTTP/2.0" 422 222 "" "Go-http-client/2.0" -
10.144.20.111 - gitlab-ci-token [23/Jun/2023:18:21:34 +0000] "DELETE /api/v4/projects/1794/terraform/state/gitlab/lock HTTP/2.0" 200 3 "" "Go-http-client/2.0" -
Notice how the 500 error does show up there, but I can't find that in any of the other logs. I can't find any exceptions that are produced, nor can I find anything in api_json.log that has a 500 error. Unfortunately this is not consistent and I can't consistently evoke the same issue, if I re-run terraform in my Gitlab CI/CD pipeline it may succeed the next time without issues.
On the second attempt to insert a record into PostgreSQL (and the reason for the 422 error) I do indeed get a unique key constraint error:
2023-06-23_18:21:08.21038 ERROR: duplicate key value violates unique constraint "index_terraform_state_versions_on_state_id_and_version"
2023-06-23_18:21:08.21040 DETAIL: Key (terraform_state_id, version)=(3014, 105) already exists.
2023-06-23_18:21:08.21041 STATEMENT: /*application:web,correlation_id:01H3MQRTVQ31DS8HWQTDQB5DS4,endpoint_id:POST /api/:version/projects/:id/terraform/state/:name,db_config_name:main*/ INSERT INTO "terraform_state_versions" ("terraform_state_id", "created_by_user_id", "created_at", "updated_at", "version", "file_store", "file", "ci_build_id") VALUES (3014, 269, '2023-06-23 18:21:08.204165', '2023-06-23 18:21:08.204165', 105, 1, '105.tfstate', 134662) RETURNING "id"
Which makes sense, since the original 500 error was not actually a 500 on the backend and the state file is correctly persisted to disk/postgresql... and those records do exist. The serial number is correct, and is incremented from the previous terraform run.
The state file is about ~300 MiB in size, which should not cause any issues.
I am happy to try and pull more logs and provide more information!
Gitlab version:
# gitlab-rake gitlab:env:info
System information
System:
Proxy: no
Current User: git
Using RVM: no
Ruby Version: 3.0.6p216
Gem Version: 3.2.33
Bundler Version:2.3.15
Rake Version: 13.0.6
Redis Version: 6.2.11
Sidekiq Version:6.5.7
Go Version: unknown
GitLab information
Version: 15.11.8-ee
Revision: 5aa8f83c113
Directory: /opt/gitlab/embedded/service/gitlab-rails
DB Adapter: PostgreSQL
DB Version: 13.11
URL: https://<REDACTED>
HTTP Clone URL: https://<REDACTED>/some-group/some-project.git
SSH Clone URL: git@<REDACTED>:some-group/some-project.git
Elasticsearch: no
Geo: no
Using LDAP: yes
Using Omniauth: yes
Omniauth Providers: saml
GitLab Shell
Version: 14.18.0
Repository storages:
- default: unix:/var/opt/gitlab/gitaly/gitaly.socket
GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell
Related tickets I found, that are old/stale/may not be completely relevant: