Implement Concrete CI Function
What does this MR do?
Introduces concrete, a single, unified step-runner Function that replaces the existing per-script approach to job execution. Rather than migrating individual runner scripts one at a time (as script_legacy does for user script), concrete encapsulates the entire traditional job execution flow (source checkout, cache, artifacts, user scripts, and cleanup) into one self-contained Function that is dispatched via the step-runner.
This is an experiment to validate the approach described in step-runner#386.
Key changes
-
New
functions/concretepackage: Implements theconcretestep-runner Function (concrete.Spec()/concrete.Run), registered alongsidescript_legacyincommands/steps/steps.go. -
New
functions/concrete/builderpackage: ABuild()function that takes aspec.Joband a variable provider and produces a JSON-serialisedrun.Config, the complete description of everythingconcreteneeds to execute the job. This includes: git source checkout config (usinghelpers/url.GitAuthHelperfrom !6483 (merged)), cache extract/archive (using thecacheconfigpackage extracted in #39238 (closed)), artifact download/upload, pre/post build scripts, user steps, and cleanup. -
New
functions/concrete/run/stagespackage: Individual stage types (GetSources,CacheExtract,CacheArchive,ArtifactDownload,ArtifactUpload,Cleanup,Step) each with aRun(ctx, env)method. Includes a full integration test suite forGetSourcescovering clone, fetch, shallow/unshallow, submodules, clean flags, and retry behaviour. -
New
functions/concrete/run/stages/internal/scriptwriterpackage: Generates bash and PowerShell scripts from a list of command lines, with support for debug tracing, exit code checks, and CI section markers. -
New
functions/concrete/run/envpackage: AnEnvtype that carries job context (working dir, shell, token, env vars, stdout/stderr) and provides helpers for running runner sub-commands (artifacts-uploader,cache-archiver, etc.). -
FF_CONCRETEfeature flag: Gates the new execution path. When enabled and the job has no nativerun:steps,executeScriptincommon/build.gocallsstagesToConcreteStep()to build theconcretestep config and dispatches it via the step-runner, bypassing the traditional shell-based execution path entirely. -
common/build.gorefactor:executeScriptis cleaned up. The nestedif/elsefor step dispatch is replaced with aswitch, andpickPriorityErroris now called inline rather than via intermediate variables.
Supporting refactors
This MR builds on a few preparatory refactors that were broken out to keep the diff focused:
-
!6483 (merged): Extracted
helpers/url.GitAuthHelpersoconcretecan configure git authentication without pulling in the shell executor. -
#39238 (closed): Extracted the
cacheconfigpackage soconcretecan build cache configuration independently. -
!6509 (merged): Extracted key sanitization logic and tests to their own package
cache/cachekeyso they can be used by both the abstract shell and concrete.
What this does not do (yet)
- Cancellation: Blocked on step-runner!401.
-
Git installation: Currently there is a temporary
install_gitstep prepended to the concrete step list that attempts to install git via common package managers if it is not already present. This is a known rough edge (marked with atodocomment) and blocked on base-images!104. - Remove shell escaping: The scriptwriter still inherits the existing escaping approach.
- Optimise individual stage implementations: The stage implementations wrap existing behaviour; they can be replaced and improved independently. There are no one-way-door decisions here.
Philosophy and trade-offs
This is not the ideal implementation. The original "concrete" plan envisioned a clean-slate reimplementation of job execution. What we have here is a large compromise in favour of speed and backwards compatibility: it wraps the existing behaviour rather than replacing it, and it inherits many of the rough edges that come with that.
Nobody is more aware of the rough edges than I am. This is not the MR I set out to write, and compromises were made on all sides to get here. But I believe it moves us toward the goal faster than the incremental "step-by-step" migration plan, and we need to make meaningful progress now. The goal is to get this merged and then iterate, not to treat it as the finished product.
What's the best way to test this MR?
The builder package has a kitchen-sink unit test (TestBuildConcreteKitchenSink) that validates the full JSON config produced from a representative job. The scriptwriter package has unit tests for both bash and PowerShell script generation. The GetSources stage has integration tests (build tag integration) covering clone, fetch, shallow/unshallow, submodules, clean flags, and retry/clear-worktree behaviour.
There are integration and unit test gaps, particularly around caches and artifacts, which are left mainly to end-to-end coverage. This mirrors the existing situation in the abstract shell executor. Since this code is effectively a port of that execution flow, it inherits the same testing shortcomings for now. Improving coverage is planned as follow-up work.
End-to-end testing requires the FF_CONCRETE feature flag to be enabled on a runner with a step-runner connector executor.
How to review
Given that this will be merged pretty much as is, the review of this MR should focus on the FF_CONCRETE feature flag, ensuring that the changes in build.go when the flag is disabled work as before.
For anything else, we can open an issue once this is merged.
Next steps
- Extensive testing to build confidence and close the coverage gaps mentioned above.
- Implement the missing pieces: cancellation support (blocked on step-runner!401) and proper git installation (blocked on base-images!104).
- Actively try to break this implementation and find where it falls over.
- We should make our team the owners of
shell/abstract.go, so that we can keep this implementation in sync. - If this experiment proves successful (covering the majority of existing functionality with a solid starting point), we can move onto harder problems that would otherwise be deferred much further down the line:
- Extend support to additional executors.
- Explore whether the step-runner running in the build container removes the need for the helper container entirely.
- Move past the script migration phase and start doing work that benefits CI Functions as a whole.
What are the relevant issue numbers?
Closes step-runner#386