GitLab Functions (steps) fail to run using version 18.7.0-pre
Summary
GitLab Functions (formerly Steps) fail to execute on GitLab Runner version 18.7.0. The issue is not present in version 18.6.6.
Steps to Reproduce
Create a step-runner job with the following configuration:
changelog-format:
stage: test
image: registry.gitlab.com/gitlab-org/step-runner:v0
run:
- name: changelog
step: ./.gitlab/steps/changelog
inputs:
changelog: "${{job.CI_PROJECT_DIR}}/CHANGELOG.md"
echo_latest: true
| Resource | Link |
|---|---|
| Job log | View log |
| Screenshot | ![]() |
Observed Behavior
Jobs timeout after one hour with the Runner sending empty trace patches every minute until termination.
Expected Behavior
Jobs should complete successfully, as they do on deployments running earlier Runner versions.
Analysis
Scope of Impact
- Issue affects only jobs using GitLab Functions
- Standard image pulls earlier in the job succeed
- Problem appears to occur after the image is pulled from the registry
Recent Changes
Two changes were recently made to the way Runner executes steps:
-
gitlab-runner-helpernow starts the functions gRPC server for "native" function jobs (bypassing the shim) - A new
Connect()method provides a generic way for executors to obtain a gRPC server connection
Log Evidence
Log analysis shows the job is received and started, followed by an hour of empty trace patches until timeout.
Metrics
- Private runners show no increased error rate since the version change (consistent with Functions-only impact)
- Spike in timeouts observed Friday ~12:00 UTC
- Timeouts also occur on shared-gitlab-org runners (which were not updated), suggesting potential server-side factors: Dashboard
Environment
| Component | Version | Status |
|---|---|---|
| Private runners (updated Thu evening UTC) | 18.7.0~pre.390.g6d7a049f (6d7a049f) |
|
| Previous working version | 18.4.0~pre.246.g71914659 (71914659) |
|
| Manually deployed EC2 Runner | 18.6.6 (df85dadf) |
|
Recommended Actions
- Immediate: Investigate the root cause, focusing on the gRPC connection lifecycle
- Short-term: Add enhanced logging around the functions gRPC server connection and step execution
- Long-term: Implement monitoring for Functions-specific execution paths
⚠️ Urgency
A fix must be deployed by 2025-12-18 due to the Hard Production Change Lock.
Edited by Cameron Swords
