indexer: split benign code-pipeline file skips out of errors_total into a new files_skipped_total counter
`gkg_indexer_code_errors_total` currently conflates real failures with policy-driven file skips at a roughly 1000:1 ratio in `orbit-prd`, which makes the indexer-dashboard "errors by stage" panel misleading and the metric unsuitable for paging. Split the benign skips out into a new counter so `errors_total` reflects only real errors. ## Live evidence (orbit-prd, last 24h) ``` gkg_indexer_code_errors_total{stage="parse"} 1111 gkg_indexer_code_errors_total{stage="repository_fetch"} 7 gkg_indexer_code_errors_total{stage="internal"} 5 gkg_indexer_code_repository_empty_total{reason="not_found"} 4418 ``` The 1111 `parse` events are dominated by JS pipeline soft-skips. Two example log lines from `gkg-indexer-b7c8c68bc-4dhzn`: - `"js: skipped file" path=frontend/src/assets/CustomIcons/KubernetesIcon.tsx error="Skipping ...: line too long (5001 bytes, max 5000)"` - `"js: skipped file" path=data/metric_definitions_20260426.json error="refusing oversize file: ... (4808847 bytes, max 2097152)"` Both are policy outcomes (config-driven `max_file_size_bytes` and line-length cap), not failures. The 5 `internal` events are all per-file `sentinel_timeout` watchdogs — also expected behaviour, not outages. The 7 `repository_fetch` events are real (unclassified Gitaly/network failures). Recognized empty-repo cases are already routed to `repository_empty_total`, so there is no double-count. ## Proposal Add a new counter `gkg.indexer.code.files.skipped` with a `reason` label and reroute the benign cases. | Reason | Source | Today | |---|---|---| | `oversize` | `max_file_size_bytes` skip | rolled into `errors_total{stage="parse"}` | | `line_too_long` | per-line cap skip | rolled into `errors_total{stage="parse"}` | | `timeout_sentinel` | per-file watchdog | rolled into `errors_total{stage="internal"}` | | `parse_grammar` | optional, only if we want to keep visibility | could stay in `errors_total{stage="parse"}` | After the split: - `errors_total` rate becomes alert-worthy. `repository_fetch`, `checkpoint`, `file_read`, `thread_pool`, `sentinel` (thread spawn), `graph_node`, `arrow_conversion`, `sink_write` are real-error stages. - `files.skipped` answers "how often are we walking past content?" without polluting the error panel. - Pattern matches the existing `repository_empty_total` precedent for separating expected short-circuits from errors. <details> <summary>Emission sites</summary> - `crates/code-graph/src/v2/langs/custom/js/pipeline.rs:62-69` — JS pipeline `tracing::warn!("js: skipped file")` then records `ParseFailed` via `ctx.record_error`. Classify the error string and route to `files.skipped` instead. - `crates/code-graph/src/v2/langs/custom/rust/mod.rs:146-152` — symmetric Rust pipeline path. - `crates/code-graph/src/v2/pipeline.rs:934` — `Internal { context: "sentinel_timeout", ... }` watchdog. Reroute to `files.skipped{reason="timeout_sentinel"}`. - `crates/indexer/src/modules/code/indexing_pipeline.rs:288-291` — fans `ctx.record_error` into the counter via `CodeGraphError::stage()`. Update the dispatch to route benign variants to the new counter. - Stage strings are produced by `crates/code-graph/src/v2/error.rs:84-95` (`CodeGraphError::stage()`). </details> <details> <summary>Out of scope</summary> - Adding a `top_level_namespace_id` label to the metric (tracked separately, see related issues). - Reworking the `error.rs` enum shape; this issue keeps the existing variants and only changes the recording side. - Helm chart or alert routing changes; once `errors_total` is meaningful, alerts can land in a follow-up. </details> ## Test plan - Unit test: assert that an oversize-file skip increments `files.skipped{reason="oversize"}` and not `errors_total`. - Spot-check in Playground: panel "Code: errors by pipeline stage" should drop from ~1100 to single digits per hour. ## Related - Audit lives in the merged !1050 dashboard work - #524 Schema migration metrics not populating in orbit-prd - #516 indexer: avoid landing full Gitaly archive on /tmp emptyDir cc @bohdanpk @jgdoyon1 <!-- AI-Sessions dir: ~/.claude/projects/-Users-angelo-rivera/ 022688d9-98e5-4fcd-8f01-fdc5a4bb3364.jsonl (2026-04-25) -->
issue