Proposal: Standardised Configuration Management Module for LabKit (#591894) · Issues · GitLab.org / GitLab

Proposal: Standardised Configuration Management Module for LabKit

<details> <summary> Everyone can contribute. [Help move this issue forward](https://handbook.gitlab.com/handbook/marketing/developer-relations/contributor-success/community-contributors-workflows/#contributor-links) while earning points, leveling up and collecting rewards. </summary> - [Label this issue](https://contributors.gitlab.com/manage-issue?action=label&projectId=278964&issueIid=591894) </details>  # Proposal: Standardised Configuration Management Module for LabKit --- ## Summary This proposal introduces a standardised, protobuf-first configuration management module for [LabKit](https://gitlab.com/gitlab-org/labkit), GitLab's internal platform tooling library (`v2/config`). The module provides a unified, opinionated approach to configuration handling across GitLab's internal tooling ecosystem — applying guard-rails, enforcing a consistent format, and using Protocol Buffers as the single source of truth for configuration schemas. Downstream tooling (Cloud Native GitLab Helm Charts, Omnibus, Caproni, and others) can rely on the published `.proto` files for validated, automated configuration testing. --- ## Motivation Configuration management is a cross-cutting concern across GitLab's internal tooling. At present, each tool defines its own configuration format, validation logic, and loading semantics. This leads to: - **Inconsistent formats:** Some tools use YAML, others TOML, with differing conventions for nesting, naming, and defaults. Gitaly and Workhorse use TOML — a format that the Operate team, who maintain Cloud Native GitLab Charts, have difficulty serializing due to differences between the type systems used for TOML and YAML. - **Duplicated validation:** Every tool that needs to assert a configuration is valid re-implements its own checks, often incompletely. - **No shared contract:** Downstream tooling (Helm chart packaging, Omnibus build pipelines) has no reliable, machine-readable specification to validate against in unit tests. - **Poor developer experience:** Engineers working across multiple tools must context-switch between different configuration idioms with no consistent mental model. - **Documentation:** Without a machine-readable specification, documentation tooling must rely on hand-authored content that can drift out of sync with the actual configuration. A shared protobuf schema provides a single source of truth from which accurate, up-to-date documentation can be generated automatically. - **No lifecycle stage enforcement:** GitLab currently has no mechanism to indicate whether a configuration setting is experimental, internal, alpha, beta, or stable. This is expressed only through human-targeted documentation, making it difficult for users to know whether a setting is intended for general consumption. Proto field options — as explored in [gitlab-org/charts/gitlab#6219](https://gitlab.com/gitlab-org/charts/gitlab/-/work_items/6219) — provide a consistent, machine-enforceable means of attaching and validating lifecycle stage metadata against configuration settings. A centralised LabKit module addresses all of these concerns in one place, following the same convention-over-configuration philosophy that LabKit already applies to tracing, logging, and request context propagation. --- ## Goals - Provide a single, consistent configuration format (YAML or JSON) for all LabKit-integrated tooling. - Use Protocol Buffers as the single source of truth for configuration schemas, stored at a canonical path within each tool's repository. - Enable downstream tooling to reference the published schema directly for automated unit-test validation, without reimplementing validation logic. - Support all common configuration structures: scalars, maps, lists, nested objects, and optional/required fields. Defaults can be expressed in the proto schema, but applications are responsible for implementing defaults at the application layer. - Apply guard-rails at load time: fail fast on schema violations, type mismatches, and constraint violations. - Be adoptable with minimal disruption: existing Go structs used by services should be replaceable with generated protobuf types with minimal change. This is essential for incremental migration of existing tools. - Lay the groundwork for future capabilities: migration tooling, validation CLI commands, environment variable overrides, and documentation assistance tooling. ## Non-Goals - This proposal does not define a runtime secrets management system. Secrets should continue to be injected via the existing secrets management pathways and referenced by path or environment variable. - This proposal does not replace infrastructure-level configuration (e.g., Kubernetes manifests, Terraform variables). - This proposal does not mandate a migration of all existing tool configurations immediately; adoption will be incremental. --- ## Proposed Design ### 1. Configuration Format Configurations will be expressed in **YAML**, with **JSON** supported as an alternative. YAML is preferred for human-authored configuration files for several reasons: - It is the dominant format across GitLab's infrastructure tooling and is already familiar to the system owners and SREs who manage these services day-to-day. - It has been widely adopted for systems such as Kubernetes, meaning the format carries no learning curve for engineers already working in that ecosystem. - Its type system aligns with Helm and our existing configuration systems, which is particularly important for the Operate team when generating configurations for Helm chart deployments. - Tooling such as `yq` can parse, read, and manipulate YAML in automation scripts, making it straightforward to transform or inspect configurations in CI pipelines and operational runbooks. - Many editors support JSON Schema validation and autocompletion when editing YAML files, providing an immediate developer experience benefit. JSON is provided as an alternative for machine-generated configurations. TOML support is available as an opt-in for tools that require it during migration, but YAML is the target format. A canonical configuration file for a tool might look like: ```yaml # widget-service.config.yaml server: host: "0.0.0.0" port: 8080 timeout_seconds: 30 logging: level: "info" format: "json" feature_flags: enable_experimental_cache: false ``` Configuration files may optionally declare a top-level integer `version` field, expressing only the major version number. If omitted, the loader assumes version `1`. This is the hook for the migration system (see [Migration System](#3-migration-system)). ### 2. Protobuf-First Schema Rather than a standalone JSON Schema file, this module uses **Protocol Buffers** as the single source of truth for configuration structure and validation rules. Each tool defines its configuration in a versioned `.proto` file. By convention, schemas are stored at: ``` <tool-repo-root>/proto/config/v<N>/config.proto ``` All proto files use modern **Protocol Buffers Edition 2023** syntax, which is the future-proof approach aligned with the protobuf roadmap. The wire format remains identical to proto3 and is fully backward compatible. A typical configuration schema looks like: ```proto edition = "2023"; package myapp.config.v1; option go_package = "myapp/gen/config/v1;configv1"; option features.field_presence = IMPLICIT; import "buf/validate/validate.proto"; message Config { int32 version = 1; ServerConfig server = 2 [(buf.validate.field).required = true]; } message ServerConfig { string host = 1 [(buf.validate.field).string.min_len = 1]; uint32 port = 2 [(buf.validate.field).uint32 = { gte: 1 lte: 65535 }]; } ``` This well-known location means that downstream tooling — Helm chart CI pipelines, Omnibus build scripts, Caproni test suites — can resolve and load schemas predictably without requiring any runtime negotiation or tool-specific knowledge. #### Validation with protovalidate Validation rules are defined inline in the proto file using [protovalidate](https://protovalidate.com) constraints, which use CEL (Common Expression Language) expressions. This means the schema and its validation rules are always co-located and cannot diverge. Standard constraints cover the common cases: ```proto message ServerConfig { string host = 1 [(buf.validate.field).string = { min_len: 1 max_len: 255 }]; uint32 port = 2 [(buf.validate.field).uint32 = { gte: 1 lte: 65535 }]; } ``` Cross-field validation is also supported via CEL expressions directly on the message: ```proto message TLSConfig { string cert_path = 1; string key_path = 2; option (buf.validate.message).cel = { id: "tls_pair" message: "cert_path and key_path must both be set or both be empty" expression: "(this.cert_path == '') == (this.key_path == '')" }; } ``` ### 3. Migration System The optional `version` field is the foundation of the migration system. A file with no `version` field is treated as version `1`. Migrations use a **typed, generic function signature** — `func(source S) (T, error)` — providing full compiler type checking and IDE autocomplete support: ```go func migrateV1ToV2(source *configv1.Config) (*configv2.Config, error) { return &configv2.Config{ Version: 2, Server: &configv2.ServerConfig{ Address: source.Server.Host, // renamed field — type-safe, IDE-navigable Port: source.Server.Port, Timeout: durationpb.New(30 * time.Second), }, Logging: source.Logging, }, nil } loader, _ := config.New(config.WithMigration(migrateV1ToV2)) ``` The migration flow applies double validation for safety: 1. Parse the config file into the target proto to detect the version field. 2. Detect a version mismatch (e.g., file is v1, binary expects v2). 3. Re-parse the config file into the source type (v1). 4. **Pre-migration validation**: Validate against the v1 schema. 5. Run the typed migration function to transform v1 → v2. 6. **Post-migration validation**: Validate against the v2 schema. Only a single major version upgrade (`N-1 → N`) is supported per service at a time. This is an explicit guard-rail: service teams are required to clean up old configuration support rather than accumulating indefinite version debt. Given that customers cannot skip over more than one major version upgrade, this is a reasonable constraint. ### 4. LabKit Module API The module exposes a clean, minimal API: ```go import "gitlab.com/gitlab-org/labkit/v2/config" func main() { loader, err := config.New() if err != nil { log.Fatal(err) } var cfg configv1.Config if err := loader.Load("widget-service.config.yaml", &cfg); err != nil { log.Fatal(err) } } ``` The loader will, in order: 1. Locate and read the configuration file (YAML or JSON, detected by extension). 2. Unmarshal the file into the target proto message. 3. Detect any version mismatch and run the registered migration if present. 4. Validate the resulting message using protovalidate constraints. 5. Return a detailed, actionable error if any step fails. Validation errors reference the specific field path, line, and column, making them immediately actionable for operators: ``` config.yaml:5:3: invalid ServerConfig.port: value must be <= 65535 but got 99999 ``` ### 5. Guard-Rails | Guard-rail | Default | Rationale | |---|---|---| | Type mismatches fail fast | Enabled | Prevents runtime surprises from implicit coercion | | Required fields enforced | Enabled | Via protovalidate `required` constraints | | Version field defaults to `1` if absent | Enabled | Lowers adoption friction while preserving migration capability | | Schema validation at load | Enabled | Single source of truth for what is valid | | Unknown keys rejected (strict mode) | **Disabled** | Opt-in only — required for rollback safety in production | #### Strict Mode and Rollback Safety Unknown fields are silently ignored by default. This is a deliberate choice: if a binary is rolled back from v2 to v1 after a new config field has been deployed, the v1 binary must not crash on encountering the unknown field. Strict mode — which rejects unknown fields — should be used in CI pipelines and pre-deployment validation, never in production deployments or canary rollouts: ```go // For CI validation only loader, _ := config.New(config.WithStrictMode()) ``` ### 6. Format Support | Format | Availability | Notes | |---|---|---| | YAML (`.yaml`, `.yml`) | Always available | Recommended for human-authored configs | | JSON (`.json`) | Always available | Recommended for machine-generated configs | | TOML (`.toml`) | Opt-in | For tools migrating from TOML during adoption | TOML is available as an opt-in parser to ease migration from existing TOML-based tools (notably Gitaly and Workhorse). It is not the recommended target format. ```go import ( "gitlab.com/gitlab-org/labkit/v2/config" "gitlab.com/gitlab-org/labkit/v2/config/toml" ) loader, _ := config.New( config.WithParser(toml.NewTOMLParser()), ) ``` ### 7. Downstream Schema Validation Downstream tools (Helm Charts CI, Omnibus, Caproni) can use the well-known proto schema path to validate example or production configurations as part of their own test suites. The protobuf schema, combined with protovalidate constraints, provides a richer contract than a plain JSON Schema — covering not just structure and types but also value constraints and cross-field rules. --- ## Supported Configuration Structures The module supports all common configuration patterns encountered across GitLab's tooling: - **Scalars:** strings, integers, floats, booleans - **Nested objects:** arbitrarily deep proto message nesting - **Lists:** `repeated` fields for homogeneous arrays of scalars or messages - **Maps:** `map<K, V>` for dynamic key sets (e.g., per-service configuration blocks) - **Duration values:** `google.protobuf.Duration`, serialized as human-readable strings (`"30s"`, `"5m"`) in YAML/JSON - **File paths:** validated as strings; resolution left to the consuming application - **Oneof / discriminated unions:** protobuf `oneof` for mutually exclusive configuration branches --- ## Future Work The following capabilities are explicitly out of scope for the initial release but are designed for in the current architecture: ### 1. Standardised Validation Commands The module will provide a ready-made CLI subcommand that any LabKit-integrated tool can expose: ```bash my-tool config validate --file widget-service.config.yaml ``` This gives operators a single, consistent way to check whether a configuration is valid before deploying it, regardless of which tool they are working with. Strict mode would be the default for this command. ### 2. Environment Variable Overrides A structured mechanism for environment variable overrides will be introduced in a future release. The design of override precedence, naming conventions, and interaction with schema validation will be addressed in a dedicated proposal. ### 3. Defaults Management A standardised mechanism for declaring and applying default values will be introduced once the module has been retrofitted to existing tools and their existing defaults handling is better understood. Proto field defaults and a tag-based approach at the application layer (such as `creasty/defaults`) are candidates, but the design will be informed by real adoption experience. ### 4. Documentation Assistance Tooling The machine-readable protobuf schema — including field names, types, constraints, and any lifecycle annotations — provides a foundation for auto-generating configuration reference documentation. This will be explored once the schema convention is stable across multiple tools. ### 5. Lifecycle Stage Enforcement Proto field options can encode lifecycle stage metadata (experimental, alpha, beta, stable, deprecated) in a machine-readable way, enabling tooling to warn or error when operators use settings not intended for general consumption. The groundwork for this is laid by the proto-first schema approach, and is being explored further in [gitlab-org/charts/gitlab#6219](https://gitlab.com/gitlab-org/charts/gitlab/-/work_items/6219). --- ## Adoption Path 1. **Phase 1 — Module implementation:** ✅ Implemented in [labkit!345](https://gitlab.com/gitlab-org/labkit/-/merge_requests/345). Core loader, protovalidate integration, typed migration system, and format support are available in `v2/config`. 2. **Phase 2 — Pilot integration:** Integrate the module into Donkey and Caproni to validate the API and surface edge cases. Caproni already uses a JSON Schema for its configuration; one path worth considering is migrating Caproni's existing configuration to the protobuf-first approach as part of the pilot. 3. **Phase 3 — Downstream schema validation:** Add proto-based validation steps to Helm Charts CI and Omnibus pipelines, using the well-known schema location. No changes to those tools' own runtime code are required. 4. **Phase 4 — Broader adoption:** Migrate additional tools incrementally. TOML opt-in support provides a low-friction on-ramp for Gitaly and Workhorse. --- ## Alternatives Considered **Use an existing Go configuration library (e.g., Viper).** Viper is widely used but brings significant complexity, does not enforce a well-known schema location, and does not integrate natively with protobuf or protovalidate. It also has known limitations around strict unknown-key rejection. A focused LabKit module keeps the interface minimal and aligned with GitLab's specific conventions. **Use JSON Schema rather than protobuf.** JSON Schema is a reasonable choice for format validation, but it requires a separate validation pass and does not provide the same level of type safety, code generation, or cross-language interoperability as Protocol Buffers. Protobuf schemas also support richer constraints via protovalidate and can encode lifecycle metadata via custom options — capabilities that JSON Schema cannot match without bespoke tooling. **Use TOML as the primary format.** TOML is already in use by Gitaly and Workhorse, so there is some existing precedent. However, TOML has poor serialization support in the tooling used by the Operate team to generate Helm chart configurations, and its type system does not align with YAML or JSON. It is supported as an opt-in format for migration purposes but is not the target. **Require JSON only (no YAML).** JSON is better suited to machine generation but is harder for operators to author and read, particularly for multi-line values and commented configurations. Supporting YAML as the primary format and JSON as an alternative gives both ergonomics and machine-friendliness without significant added complexity. **Embed schema validation in each tool independently.** This reproduces the current state of fragmentation. The value of centralisation is that improvements to validation, migration, and CLI tooling flow to all tools simultaneously. **Map-based migrations over typed migrations.** An earlier version of this proposal considered migrations operating on raw `map[string]any`. The implemented approach uses typed generic functions (`func(source S) (T, error)`) instead, providing compiler type checking, IDE support, and significantly easier testing. --- ## Decisions **Schema format: Protocol Buffers, not JSON Schema.** Protocol Buffers Edition 2023 is used as the single source of truth for configuration schemas. This provides code generation, type safety, cross-language interoperability, and rich inline validation via protovalidate — none of which are available from JSON Schema without additional bespoke tooling. **Schema location: per-tool paths, not a central registry.** Each tool maintains its own schema at the well-known path `proto/config/v<N>/config.proto` within its own repository. A central LabKit-managed registry adds a coordination surface without sufficient benefit at this stage. Per-tool paths keep schemas close to the code. **Strict mode is opt-in, not the default.** Unknown fields are silently ignored by default to preserve rollback safety in production deployments. Strict mode is recommended for CI pipelines and pre-deployment validation only. **Environment variable overrides: not supported in this iteration.** The interaction between override precedence, schema validation, and the security implications of unintentional configuration injection warrants a dedicated design. **The `version` field is an optional integer, defaulting to `1` if absent.** Requiring a version field would create unnecessary friction for adopters. Omitting it is valid and implies version `1`. Only single-step major version migrations (`N-1 → N`) are supported, enforcing cleanup of old configuration support rather than indefinite version accumulation. **Typed migrations over map-based migrations.** Migration functions use the signature `func(source S) (T, error)` rather than operating on raw `map[string]any`. This provides compiler type checking, IDE support, and makes migrations significantly easier to write and test correctly. --- ## Summary of Benefits - **Consistency:** One configuration idiom across all LabKit-integrated tools, backed by a strongly-typed protobuf schema. - **Safety:** protovalidate constraints and guard-rails catch misconfiguration before deployment, with rich error messages including file and line information. - **Rollback safety:** Unknown fields are ignored by default, so rollbacks do not cause crashes due to version skew. - **Shared contract:** Downstream packaging and testing tooling gains a reliable, machine-readable specification at zero additional cost. - **Developer experience:** Engineers learn the pattern once and apply it everywhere; typed migrations provide IDE support and compiler checks throughout. - **Forward compatibility:** The `version` field and typed migration system make configuration evolution tractable and safe. cc @chsanders @reprazent @WarheadsSE @e_forbes @igorwwwwwwwwwwwwwwwwwwww @jdrpereira @stanhu @ayufan @sxuereb

issue