Gitlab Terraform HTTP state HTTP Error 422

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

When using Gitlab as a terraform state backend I am seeing the following behavior:

  1. Terraform locks the state file (success)
  2. Terraform downloads the state file (success)
  3. Terraform uploads the modified state file (500 error)
  4. Terraform uploads the modified state file a second time (422 error)
  5. Terraform unlocks the state file

Terraform then fails with the following error message:


│ Error: Failed to save state

│ Error saving state: HTTP error: 422


│ Error: Failed to persist state to backend

│ The error shown above has prevented Terraform from writing the updated
│ state to the configured backend. To allow for recovery, the state has been
│ written to the file "errored.tfstate" in the current working directory.

│ Running "terraform apply" again at this point will create a forked state,
│ making it harder to recover.

│ To retry writing this state, use the following command:
│     terraform state push errored.tfstate

I attempted to figure out what was going on and was looking at logs, but nothing popped out. I could only see the single POST request that would return the 422 error in api_json.log for gitlab-rails logs:

{
  "time": "2023-06-23T18:21:22.933Z",
  "severity": "INFO",
  "duration_s": 16.27093,
  "db_duration_s": 0.09819,
  "view_duration_s": 16.17274,
  "status": 422,
  "method": "POST",
  "path": "/api/v4/projects/1794/terraform/state/gitlab",
  "params": [
    {
      "key": "ID",
      "value": "8f07c995-92a8-996e-21cb-8746c9573704"
    },
    {
      "key": "version",
      "value": 4
    },
    {
      "key": "terraform_version",
      "value": "1.4.6"
    },
    {
      "key": "serial",
      "value": 105
    },
    {
      "key": "lineage",
      "value": "aae94282-8ec6-af14-501f-4af2975fe523"
    },
    {
      "key": "truncated",
      "value": "..."
    }
  ],
  "host": "<REDACTED>",
  "remote_ip": "10.144.20.111, 127.0.0.1",
  "ua": "Go-http-client/2.0",
  "route": "/api/:version/projects/:id/terraform/state/:name",
  "user_id": 269,
  "username": "<REDACTED>",
  "redis_calls": 9,
  "redis_duration_s": 0.048744,
  "redis_read_bytes": 881,
  "redis_write_bytes": 471,
  "redis_cache_calls": 4,
  "redis_cache_duration_s": 0.046584,
  "redis_cache_read_bytes": 865,
  "redis_cache_write_bytes": 272,
  "redis_shared_state_calls": 5,
  "redis_shared_state_duration_s": 0.00216,
  "redis_shared_state_read_bytes": 16,
  "redis_shared_state_write_bytes": 199,
  "db_count": 20,
  "db_write_count": 1,
  "db_cached_count": 5,
  "db_replica_count": 0,
  "db_primary_count": 20,
  "db_main_count": 20,
  "db_main_replica_count": 0,
  "db_replica_cached_count": 0,
  "db_primary_cached_count": 5,
  "db_main_cached_count": 5,
  "db_main_replica_cached_count": 0,
  "db_replica_wal_count": 0,
  "db_primary_wal_count": 0,
  "db_main_wal_count": 0,
  "db_main_replica_wal_count": 0,
  "db_replica_wal_cached_count": 0,
  "db_primary_wal_cached_count": 0,
  "db_main_wal_cached_count": 0,
  "db_main_replica_wal_cached_count": 0,
  "db_replica_duration_s": 0,
  "db_primary_duration_s": 0.035,
  "db_main_duration_s": 0.035,
  "db_main_replica_duration_s": 0,
  "cpu_s": 51.244262,
  "mem_objects": 14062719,
  "mem_bytes": 4240207360,
  "mem_mallocs": 12437123,
  "mem_total_bytes": 4802716120,
  "pid": 4031821,
  "worker_id": "puma_1",
  "rate_limiting_gates": [],
  "correlation_id": "01H3MQRTVQ31DS8HWQTDQB5DS4",
  "meta.caller_id": "POST /api/:version/projects/:id/terraform/state/:name",
  "meta.remote_ip": "10.144.20.111",
  "meta.feature_category": "infrastructure_as_code",
  "meta.user": "<REDACTED>",
  "meta.user_id": 269,
  "meta.project": "coreservices/gitlab/configuration",
  "meta.root_namespace": "coreservices",
  "meta.client_id": "user/269",
  "meta.subscription_plan": "default",
  "content_length": "284918969",
  "request_urgency": "low",
  "target_duration_s": 5
}

However the NGINX gitlab_access.log showed the following entries for the above sequence:

10.144.20.111 - gitlab-ci-token [23/Jun/2023:18:12:03 +0000] "POST /api/v4/projects/1794/terraform/state/gitlab/lock HTTP/2.0" 200 170 "" "Go-http-client/2.0" -
10.144.20.111 - gitlab-ci-token [23/Jun/2023:18:12:14 +0000] "GET /api/v4/projects/1794/terraform/state/gitlab HTTP/2.0" 200 284908693 "" "Go-http-client/2.0" -
10.144.20.111 - gitlab-ci-token [23/Jun/2023:18:20:14 +0000] "POST /api/v4/projects/1794/terraform/state/gitlab?ID=8f07c995-92a8-996e-21cb-8746c9573704 HTTP/2.0" 500 3054 "" "Go-http-client/2.0" -
10.144.20.111 - gitlab-ci-token [23/Jun/2023:18:21:24 +0000] "POST /api/v4/projects/1794/terraform/state/gitlab?ID=8f07c995-92a8-996e-21cb-8746c9573704 HTTP/2.0" 422 222 "" "Go-http-client/2.0" -
10.144.20.111 - gitlab-ci-token [23/Jun/2023:18:21:34 +0000] "DELETE /api/v4/projects/1794/terraform/state/gitlab/lock HTTP/2.0" 200 3 "" "Go-http-client/2.0" -

Notice how the 500 error does show up there, but I can't find that in any of the other logs. I can't find any exceptions that are produced, nor can I find anything in api_json.log that has a 500 error. Unfortunately this is not consistent and I can't consistently evoke the same issue, if I re-run terraform in my Gitlab CI/CD pipeline it may succeed the next time without issues.

On the second attempt to insert a record into PostgreSQL (and the reason for the 422 error) I do indeed get a unique key constraint error:

2023-06-23_18:21:08.21038 ERROR:  duplicate key value violates unique constraint "index_terraform_state_versions_on_state_id_and_version"
2023-06-23_18:21:08.21040 DETAIL:  Key (terraform_state_id, version)=(3014, 105) already exists.
2023-06-23_18:21:08.21041 STATEMENT:  /*application:web,correlation_id:01H3MQRTVQ31DS8HWQTDQB5DS4,endpoint_id:POST /api/:version/projects/:id/terraform/state/:name,db_config_name:main*/ INSERT INTO "terraform_state_versions" ("terraform_state_id", "created_by_user_id", "created_at", "updated_at", "version", "file_store", "file", "ci_build_id") VALUES (3014, 269, '2023-06-23 18:21:08.204165', '2023-06-23 18:21:08.204165', 105, 1, '105.tfstate', 134662) RETURNING "id"

Which makes sense, since the original 500 error was not actually a 500 on the backend and the state file is correctly persisted to disk/postgresql... and those records do exist. The serial number is correct, and is incremented from the previous terraform run.

The state file is about ~300 MiB in size, which should not cause any issues.

I am happy to try and pull more logs and provide more information!

Gitlab version:

# gitlab-rake gitlab:env:info

System information
System:
Proxy:		no
Current User:	git
Using RVM:	no
Ruby Version:	3.0.6p216
Gem Version:	3.2.33
Bundler Version:2.3.15
Rake Version:	13.0.6
Redis Version:	6.2.11
Sidekiq Version:6.5.7
Go Version:	unknown

GitLab information
Version:	15.11.8-ee
Revision:	5aa8f83c113
Directory:	/opt/gitlab/embedded/service/gitlab-rails
DB Adapter:	PostgreSQL
DB Version:	13.11
URL:		https://<REDACTED>
HTTP Clone URL:	https://<REDACTED>/some-group/some-project.git
SSH Clone URL:	git@<REDACTED>:some-group/some-project.git
Elasticsearch:	no
Geo:		no
Using LDAP:	yes
Using Omniauth:	yes
Omniauth Providers: saml

GitLab Shell
Version:	14.18.0
Repository storages:
- default: 	unix:/var/opt/gitlab/gitaly/gitaly.socket
GitLab Shell path:		/opt/gitlab/embedded/service/gitlab-shell

Related tickets I found, that are old/stale/may not be completely relevant:

Confirmed Workaround by @archoversight

See #416185 (comment 2168687277)

Edited by 🤖 GitLab Bot 🤖