Invalid agent configuration YAML file breaks the agentk

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Close this issue

MR: Pending

UPDATE 2024-05-30

Here is our long term thinking on this issue:

We want to eventually eliminate the need to have any agent configuration at all in order to use Workspaces
We will accomplish this in phases. The first phase is to remove all settings other than remote_development: enabled from the agent config file, and instead allow them to be controlled via the Remote Development settings module (which is flexible, and get values from ENV vars, UI, or any other configurable source).
The groundwork for this is being laid by adding support for sending settings directly to the agent as part of the agent reconciliation loop: Sync Full & Partial Reconcilation From Rails & ... (gitlab-org/cluster-integration/gitlab-agent!1495 - merged)
This will leave only remote_development: enabled in the agent config. We can remove this final dependency on the agent config file by implementing a "poke" architecture, where an agent can automatically detect whether it should be handling remote development reconciliation enablement, even without the requirement of the enabled flag in the config. See discussion of this architecture here: https://gitlab.com/gitlab-org/remote-development/gitlab-remote-development-docs/-/blob/main/doc/tech-designs/2023-02-01-support-rails-poke.md

Description

As a user, when creating workspaces, I want to be able to identify whether a selected agent has any configuration issues that may prevent it from creating workspaces.

Acceptance Criteria (to be discussed)

This can be approached in multiple ways based on how we expect the product to behave.

There may be multiple approaches that may work:

When creating a workspace, hide the agent with the invalid configuration
Alternatively, display the affected agent with some UI indicator to highlight the existence of some issue
Display the agent normally in the dropdown, but perform a last-mile check when creating a workspace

Technical Requirements (to be discussed)

(Rails) add a column to RemoteDevelopmentAgentConfig that indicates whether the latest configuration is valid or not
(if required) fetch config validity once a selection is made in the dropdown
(Rails) When creating a workspace, support a specific error indicating invalidity of the agent config based on the value in the DB

Design Requirements

Impact Assessment

The approach should be backwards compatible with existing agent config. Default values may be set in the new DB column to indicate that the config is valid as invalid config is never saved to DB

Original description

When the config is invalid, the agent stops working but still reports as "connected" in the Rails UI. The only way to find out that the agent is not working is to look through agent logs for configuration errors.

Reproduce with

have a valid configuration in your agent config.yml

remote_development:
  enabled: true
  dns_zone: 'workspaces.localdev.me'
container_scanning:
  cadence: '0 0/5 * * *'

store this file as your agent config.yml

remote_development:
  blah: true
container_scanning:
  cadence: '0 0/5 * * *'

See this error message periodically show in your kas log

{"level":"info","time":"2023-04-25T13:47:14.809+0200","msg":"Config: failed to fetch","grpc_service":"gitlab.agent.agent_configuration.rpc.AgentConfiguration","grpc_method":"GetConfiguration","agent_id":2,"project_id":"gitlab-org/security/gitlab-test","error":"failed to parse agent configuration: protojson.Unmarshal: proto: (line 1:71): unknown field \"blah\""}

And this error in agentk log:

{"level":"warn","time":"2023-04-25T13:49:09.655+0200","msg":"GetConfiguration.Recv failed","error":"rpc error: code = FailedPrecondition desc = Config: failed to parse agent configuration: protojson.Unmarshal: proto: (line 1:71): unknown field \"blah\"","agent_id":2}

agenk will not start the worker to poll for workspaces

Edited Aug 25, 2025 by 🤖 GitLab Bot 🤖