Gitlab Workspaces - API backend locks up with server error 500

MR: Multi container devfile bug fix (!143022 - merged)

%16.8 back port MR: Fix bug for devfile with multiple container com... (!143316 - merged)

Summary

Changing the .devfile.yaml on a project in which workspace is already running with experimental code based on currently available documentation locks up Gitlab agent backend.

We are running a Gitlab 16.6.2 Premium self-hosting instance with attached k8s cluster for runners/ops.

Steps to reproduce

  1. Create .devfile.yaml
schemaVersion: 2.2.0
variables:
  registry-root: registry.gitlab.com
components:
  - name: tooling-container
    attributes:
      gl/inject-editor: true
    container:
      image: patnaikshekhar/go-ssh:1
      env:
        - name: KEY
          value: VALUE
      endpoints:
      - name: http-3000
        targetPort: 3000
  - name: database
    attributes:
      gl/inject-editor: false
    container:
      image: patnaikshekhar/go-ssh:1
      env:
        - name: POSTGRES_PASSWORD
          value: some_funky_pass
      endpoints:
      - name: psql
        targetPort: 5432
        exposure: none
        protocol: tcp

Kubernetes agent receives endless 500 server error from Gitlab API (persistent after restart of both agent and Gitlab instance):

{"level":"info","time":"2024-01-10T13:45:01.152Z","msg":"starting partial update","mod_name":"remote_development","agent_id":3}
{"level":"error","time":"2024-01-10T13:45:03.653Z","msg":"Remote Dev - partial sync cycle ended with error","mod_name":"remote_development","error":"unexpected status code: 500","agent_id":3}

This error persists even after reverting .devfile.yaml content to "working config".

A snippet of log that might help (Gitlab instancd is maintained by another team, I can request more logs but not sure where to look tbh, probably KAS?):

{
  "severity": "ERROR",
  "time": "2024-01-09T15:07:59.735Z",
  "correlation_id": "REDACTED",
  "meta.caller_id": "POST /api/:version/internal/kubernetes/modules/remote_development/reconcile",
  "meta.remote_ip": "REDACTED",
  "meta.feature_category": "remote_development",
  "meta.client_id": "REDACTED",
  "exception.class": "KeyError",
  "exception.message": "key not found: \"volumeMounts\"",
  "exception.backtrace": [
    "ee/lib/remote_development/workspaces/reconcile/output/devfile_parser.rb:145:in `fetch'",
    "ee/lib/remote_development/workspaces/reconcile/output/devfile_parser.rb:145:in `block (2 levels) in inject_secrets'",
    "ee/lib/remote_development/workspaces/reconcile/output/devfile_parser.rb:144:in `each'",
    "ee/lib/remote_development/workspaces/reconcile/output/devfile_parser.rb:144:in `block in inject_secrets'",
    "ee/lib/remote_development/workspaces/reconcile/output/devfile_parser.rb:115:in `each'",
    "ee/lib/remote_development/workspaces/reconcile/output/devfile_parser.rb:115:in `inject_secrets'",
    "ee/lib/remote_development/workspaces/reconcile/output/devfile_parser.rb:59:in `get_all'",
    "ee/lib/remote_development/workspaces/reconcile/output/desired_config_generator.rb:41:in `generate_desired_config'",
    "ee/lib/remote_development/workspaces/reconcile/output/workspaces_to_rails_infos_converter.rb:51:in `config_to_apply'",
    "ee/lib/remote_development/workspaces/reconcile/output/workspaces_to_rails_infos_converter.rb:31:in `block in convert'",
    "ee/lib/remote_development/workspaces/reconcile/output/workspaces_to_rails_infos_converter.rb:23:in `map'",
    "ee/lib/remote_development/workspaces/reconcile/output/workspaces_to_rails_infos_converter.rb:23:in `convert'",
    "lib/result.rb:136:in `call'",
    "lib/result.rb:136:in `map'",
    "ee/lib/remote_development/workspaces/reconcile/main.rb:27:in `main'",
    "ee/app/services/remote_development/workspaces/reconcile_service.rb:26:in `execute'",
    "ee/lib/ee/api/internal/kubernetes.rb:45:in `block (5 levels) in <module:Kubernetes>'",
    "ee/lib/gitlab/middleware/ip_restrictor.rb:11:in `call'",
    "lib/api/api_guard.rb:219:in `call'",
    "lib/gitlab/metrics/elasticsearch_rack_middleware.rb:16:in `call'",
    "lib/gitlab/middleware/memory_report.rb:13:in `call'",
    "lib/gitlab/middleware/speedscope.rb:13:in `call'",
    "lib/gitlab/database/load_balancing/rack_middleware.rb:23:in `call'",
    "lib/gitlab/middleware/rails_queue_duration.rb:33:in `call'",
    "lib/gitlab/etag_caching/middleware.rb:21:in `call'",
    "lib/gitlab/metrics/rack_middleware.rb:16:in `block in call'",
    "lib/gitlab/metrics/web_transaction.rb:46:in `run'",
    "lib/gitlab/metrics/rack_middleware.rb:16:in `call'",
    "lib/gitlab/middleware/go.rb:20:in `call'",
    "lib/gitlab/middleware/query_analyzer.rb:11:in `block in call'",
    "lib/gitlab/database/query_analyzer.rb:37:in `within'",
    "lib/gitlab/middleware/query_analyzer.rb:11:in `call'",
    "lib/gitlab/middleware/multipart.rb:173:in `call'",
    "lib/gitlab/middleware/read_only/controller.rb:50:in `call'",
    "lib/gitlab/middleware/read_only.rb:18:in `call'",
    "lib/gitlab/middleware/same_site_cookies.rb:27:in `call'",
    "lib/gitlab/middleware/path_traversal_check.rb:48:in `call'",
    "lib/gitlab/middleware/handle_malformed_strings.rb:21:in `call'",
    "lib/gitlab/middleware/basic_health_check.rb:25:in `call'",
    "lib/gitlab/middleware/handle_ip_spoof_attack_error.rb:25:in `call'",
    "lib/gitlab/middleware/request_context.rb:15:in `call'",
    "lib/gitlab/middleware/webhook_recursion_detection.rb:15:in `call'",
    "config/initializers/fix_local_cache_middleware.rb:11:in `call'",
    "lib/gitlab/middleware/compressed_json.rb:44:in `call'",
    "lib/gitlab/middleware/rack_multipart_tempfile_factory.rb:19:in `call'",
    "lib/gitlab/middleware/sidekiq_web_static.rb:20:in `call'",
    "lib/gitlab/metrics/requests_rack_middleware.rb:79:in `call'",
    "lib/gitlab/middleware/release_env.rb:13:in `call'"
  ],
  "user.username": null,
  "tags.program": "web",
  "tags.locale": "en",
  "tags.feature_category": "remote_development",
  "tags.correlation_id": "REDACTED"
}

The above piece leads me to believe there's an issue with rendered k8s manifest:

  "exception.class": "KeyError",
  "exception.message": "key not found: \"volumeMounts\"",

But reverting the devfile also means something is blocking resync/refresh.

Edited by Vishal Tak