ensure no red data logged
GKG logs go to stdout → Logstash → Elasticsearch via [`tracing`](https://docs.rs/tracing) + `labkit`. No in-repo scrubber; `labkit-rs` has no masking today ([plan §15](https://gitlab.com/gitlab-org/rust/labkit-rs/-/blob/4082a42a/docs/labkit-rs-plan.md#L779-815), scheduled for Phase 5). Any red data reaching a `tracing::*!` call or span field travels unmodified. Two gaps to close: 1. **Red-data exposure.** Concrete violations found in a first pass (see below). Root cause is the absence of both a masking layer and any enforcement against `?expr` / `{:?}` on sensitive types. 2. **No verbose-for-internal mode.** Debugging customer-reported issues against GitLab-internal repos forces a choice between over-scrubbing (no signal) and under-scrubbing (leak). ai-assist already solved this cleanly; we should copy the pattern and lift it into `labkit-rs` so every Rust GitLab service benefits. ### Scope **GKG (this repo)** - [ ] Merge the logging review agent ([`!943`](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/943)) - [ ] Verify whether `labkit` exports span fields (not just events) to the log stream. Drives severity of the gRPC `#[instrument]` findings. - [ ] Fix the 3 Critical findings below - [ ] Custom `Debug` impls for `Claims`, `ClickHouseConfiguration`, `NatsConfiguration` - [ ] Sanitize handler-error `Display` so CDC / datalake errors cannot echo row values - [ ] Workspace clippy lint forbidding `println!`, `eprintln!`, `dbg!` outside tests - [ ] Middleware to parse `x-gitlab-enabled-feature-flags` and (self-hosted) `x-gitlab-enabled-instance-verbose-ai-logs`, gate a `can_log_verbose()` helper + a `tracing_subscriber::Layer` that strips / drops events when off. Define the unmask-under-verbose allowlist (file paths, parser error snippets, datalake `Display`, commit SHAs). Red data stays masked unconditionally. **labkit-rs (upstream, via MRs from this team)** - [x] Land the `mask` module spec'd at [plan §15](https://gitlab.com/gitlab-org/rust/labkit-rs/-/blob/4082a42a/docs/labkit-rs-plan.md#L779-815) (`mask::url`, `is_sensitive_param`, `is_sensitive_header`, `REDACTION_STRING`). Phase 5 today; pull forward if our GA needs it. - [x] Add `gl_namespace_id` and `gl_root_namespace_id` to [`labkit-fields`](https://gitlab.com/gitlab-org/rust/labkit-rs/-/blob/4082a42a/labkit-fields/src/lib.rs#L81-82) (today only `gl_project_id` exists) - [x] New `verbose_logging_gate` primitive modeled on ai-assist's [`can_log_request_data`](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/4eb009bc/ai_gateway/structured_logging.py#L166-211): `Builder` accepts a feature-flag header name + a self-hosted header name; provides a `tracing` `Layer` that drops events when neither is set. Hooks into the `mask` module so allowlisted contexts can unmask specific fields without disabling redaction globally. <details> <summary>Red data definition</summary> Per [`.gitlab/duo/mr-review-instructions.yml#L32-83`](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/blob/62732cc3/.gitlab/duo/mr-review-instructions.yml#L32-L83): credentials (token, password, secret, key, signature, Authorization, Bearer, certificate), JWT tokens and claims (`user_id`, `organization_id`, `group_traversal_ids`, `ai_session_id`), OTP/MFA codes, CI/CD variables, webhook/integration URLs, user content fields (`note`, `body`, `description`, `title`, `message`, `text`, `content`, `first_name`, `last_name`), full email addresses, commit messages, raw source code. </details> <details> <summary>Prior art: ai-assist</summary> ai-assist does this without a repo-side namespace allowlist; the monolith's feature-flag targeting is the source of truth. - SaaS: [`expanded_ai_logging`](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/4eb009bc/lib/feature_flags/context.py#L8-10) flag forwarded via `x-gitlab-enabled-feature-flags`; [`FeatureFlagMiddleware`](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/4eb009bc/ai_gateway/api/middleware/feature_flag.py#L29-39) parses the header into a ContextVar - Self-hosted: [`x-gitlab-enabled-instance-verbose-ai-logs`](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/4eb009bc/lib/verbose_ai_logs/context.py#L10) header - Central gate: [`can_log_request_data()`](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/4eb009bc/ai_gateway/structured_logging.py#L166-174); structlog processor [`prevent_logging_if_disabled`](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/4eb009bc/ai_gateway/structured_logging.py#L177-181) raises `DropEvent` when off - Request context already carries `gitlab_realm`, `gitlab_instance_id`, `gitlab_root_namespace_id`, `is_gitlab_team_member` via [`base.py#L149-173`](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/4eb009bc/ai_gateway/api/middleware/base.py#L149-L173) Neither ai-assist nor the current labkit-rs keeps a namespace list inside the service. GKG should follow suit. </details> <details> <summary>Audit findings (5 parallel sub-agent sweep)</summary> **Critical** - [`typescript/swc/definitions.rs#L137`](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/blob/62732cc3/crates/code-graph/parser/src/typescript/swc/definitions.rs#L137): `debug!(?node, ...)` dumps full SWC AST of customer code - [`kotlin/expression_resolver.rs#L130`](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/blob/62732cc3/crates/code-graph/linker/src/analysis/languages/kotlin/expression_resolver.rs#L130): `{:#?}` on `KotlinExpressionInfo` - [`typescript/parser.rs#L110`](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/blob/62732cc3/crates/code-graph/parser/src/typescript/parser.rs#L110): `error!` with `{:?}` on SWC `Error` (source-range excerpts) **Warning** - [`grpc/service.rs`](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/blob/62732cc3/crates/gkg-server/src/grpc/service.rs) records `user_id`, `ai_session_id`, `source_type` as span fields. Critical if labkit exports span fields. - [`auth/claims.rs#L3`](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/blob/62732cc3/crates/gkg-server/src/auth/claims.rs#L3): `#[derive(Debug)]` on `Claims` - [`gkg-server-config/src/clickhouse.rs#L8`](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/blob/62732cc3/crates/gkg-server-config/src/clickhouse.rs#L8) and [`nats.rs#L12`](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/blob/62732cc3/crates/gkg-server-config/src/nats.rs#L12): `#[derive(Debug)]` on structs holding `password` - [`redaction/stream.rs#L36`](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/blob/62732cc3/crates/gkg-server/src/redaction/stream.rs#L36): `warn!` logs client-supplied `message` - [`indexer/modules/sdlc/pipeline.rs#L211-216`](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/blob/62732cc3/crates/indexer/src/modules/sdlc/pipeline.rs#L211-L216) and [`L320-324`](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/blob/62732cc3/crates/indexer/src/modules/sdlc/pipeline.rs#L320-L324): `%err` may echo row literals - [`gitlab-client/src/client.rs#L127-175`](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/blob/62732cc3/crates/gitlab-client/src/client.rs#L127-L175): customer branch names + SHAs at `debug!` - [`code-graph v2 ruby.rs#L84`](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/blob/62732cc3/crates/code-graph/src/v2/custom/ruby.rs#L84) and [`cli/src/main.rs#L591`](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/blob/62732cc3/crates/cli/src/main.rs#L591): parser error strings may contain source snippets </details> <!-- AI-Sessions dir: ~/.claude/projects/-Users-angelo-rivera-gitlab-orbit-knowledge-graph/ bb3dc5f1-c220-4f9b-8ee6-0a862abf2bbd.jsonl (2026-04-18) -->
issue