Draft: Add ci service probes
What does this MR do?
Runner currently supports only a TCP port check to determine the status of services it creates for CI jobs. The port checked is the first port the image's metadata exposes. This approach has worked reasonably well, but does not work for many use cases:
- The container may expose multiple ports and the first one might not lead to a successful health check.
- Just because the port is available to connect to, does not mean the service is actually ready.
- If the wrong port is chosen, there's a long timeout, that can add up to a minute before a job starts executing.
This MR introduces CI service probes, designed for familiarity of users of Kubernetes' startup probes, allowing TCP, HTTP Get, and Exec probes to be configured for each CI service. The probes configuration will be passed to Runner, so that the probes can be executed to correctly identify when a service is ready.
We've previously explored using docker's built-in HEALTHCHECK
, but this has its own issues:
- No docker-library official image uses a HEALTHCHECK. This would mean user's would often have to extend the container.
- It only supports executing a command inside of the container. Many containers rely on Kubernetes' probe functionality, so no internal command is added.
- It's worth mentioning that Kubernetes doesn't support
HEALTHCHECK
either, thecommand
only approach is likely too limiting.
A common requested feature for Kubernetes is multiple probe support. Unfortunately, Kubernetes is at a point where they cannot easily introduce this to their API. The workaround to this problem is a sidecar solution, something that we cannot easily offer for CI services. This MR allows for multiple probes to be configured to avoid this issue.
It would be good to settle this problem once and for all: TCP, HTTP Get and Exec probes likely cover most, if not all, of the cases possible to determine whether a service is available/ready.
An example:
job:
services:
- name: service1
probes:
- tcp:
port: 8080
exec:
command: ["/bin/check"]
retries: 10 # optional
initial_delay: 5 # optional
period: 10 # optional
timeout: 10 # optional
- http_get:
path: /health
port: 8080
headers:
- 'X-Custom-Header: custom'
- exec:
command: ["/bin/check/another/thing"]
- name: service2
probes:
- http_get:
path: /health
port: 8080
headers:
- 'X-Custom-Header: custom'
initial_delay: 5 # optional
period: 10 # optional
timeout: 10 # optional
- name: service3
probes:
- exec:
command: ["/bin/check"]
retries: 10 # optional
initial_delay: 5 # optional
period: 10 # optional
timeout: 10 # optional
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team