"drop job on secrets provider not found" causes regression for jobs requiring vault secrets when passing variables using dotenv files ('The secrets provider can not be found')
Summary
The drop job on secrets provider not found change (feature flag MR, FF issue, main MR) requires that
-
secrets_provider?is true, in turn -
hashicorp_vault_provider? || azure_key_vault_provider?is true meaning -
various variables are present;
VAULT_SERVER_URLin the case of Hashicorp Vault.
This change moves this validation to the pipeline creation stage, rather than to when the job gets run.
It's therefore no longer possible to supply this variable using a dotenv from a previous job in the pipeline since at pipeline creation, the dotenv doesn't exist yet.
Steps to reproduce
-
Simple reproduction CI code
- it is not a requirement for vault access to work to reproduce this error.
- set the vault IP address to a valid local web server, for example, and the job fails with a runner system error.
- the bug is in Rails, so to get that far, it's necessary to get through the Rails code.
minimal reproduction .gitlab-ci.yml
--- stages: - one - two make_dotenv: stage: one script: - echo 'VAULT_SERVER_URL=http://192.168.1.12' > vaultenv artifacts: reports: dotenv: vaultenv use_dotenv: stage: two secrets: SOME_SECRET: vault: foo/bar/password@secret file: false script: - echo 'hello world' -
In GitLab 16.5 and earlier, when the pipeline creates,
make_dotenvstarts, butuse_dotenvremains created. -
Once the first stage completes
use_dotenvruns- note: to reproduce this, I did not provide a vault. The bug occurs in Rails when building the pipeline.
- The output proves that on earlier versions, variables were passed using dotenv and the job would attempt to use them.
- I specified the IP address of a valid local NGINX server so a HTTP call could be made by the runner.
screenshot
There has been a runner system failure, please try againRunning with gitlab-runner 16.8.1 (a6097117) on xxx 4Yvq_4VE, system ID: s_10add9b5c8b1 Resolving secrets 00:00 Resolving secret "SOME_SECRET"... Using "vault" secret resolver... ERROR: Job failed (system failure): resolving secrets: initializing Vault service: preparing authenticated client: checking Vault server health: api error: status code 404: <html> <head><title>404 Not Found</title></head> <body> <center><h1>404 Not Found</h1></center> <hr><center>nginx/1.24.0</center> </body> </html> -
Run it in Gitlab 16.6 and later
- The
use_dotenvjob immediately fails. At that point,make_dotenvisn't even on a runner yet.
- The
-
The secrets provider can not be found -
Once the dotenv exists, the failed job can be re-run. Validation of the vault variable succeeds.
- Customers don't want to have to retry every job that uses vault secrets.
- In my reproduction, a
runner system failureoccurs as expected.
In 16.11 and later the error reads:
The secrets provider can not be found. Check your CI/CD variables and try again.
Example Project
See CI snippet above.
What is the current bug behavior?
dotenv can long longer be used to supply VAULT_SERVER_URL to a job that requires vault secrets, because validation occurs when the job is created, not when the job runs.
What is the expected correct behavior?
Validation should take into account all mechanisms in the product to supply variables.
Workaround
Variable precedence can be used to supply a dummy variable so validation passes, and then as usual at runtime, dotenv supplies the actual value.
use_dotenv:
stage: two
variables:
VAULT_SERVER_URL: 'http://127.0.0.1'
secrets:
SOME_SECRET:
Relevant logs and/or screenshots
Output of checks
Customer reported the issue after upgrading from 16.3 to 16.7
The feature flag was removed in 16.6.



