feat(indexer): metrics cleanup in dispatcher, sdlc and code indexing
What does this MR do and why?
Cleans up indexer metrics across the dispatcher, SDLC, and code indexing modules.
Dispatcher metrics: Adds a query label to indexer.dispatch.query.duration so we can distinguish between different dispatch queries (e.g. pending_projects vs enabled_namespaces).
SDLC metrics:
- Removes
indexer.sdlc.pipeline.batches.processed— batch count depends on ClickHouse page sizes, not workload, so it's not useful for monitoring. - Adds
indexer.sdlc.datalake.query.bytesto track data volume returned by datalake queries, which tells us more about ClickHouse read pressure than row counts alone. - Renames
record_datalake_query_durationtorecord_datalake_querysince it now records both duration and bytes.
Code indexing metrics:
- Splits the combined fetch+extract timing into two separate metrics:
repository.fetch.duration(Gitaly download) andrepository.extract.duration(tar unpacking), so we can tell whether slowness is network or disk I/O. - Removes
indexer.code.write.duration— it timed the fullwrite_graph_datacall but didn't add value beyond the existing handler duration metric. - Moves archive extraction out of
gitaly-clientinto a newarchivemodule in the indexer crate, with path traversal protection for tar entries and symlinks.
GitalyClient::pull_and_extract_repository is kept for backward compatibility, but the code indexing pipeline now calls fetch_archive + archive::unpack_archive separately.
Observability docs updated to match.
Testing
Unit and integration tests
Performance Analysis
- This merge request does not introduce any performance regression. If a performance regression is expected, explain why.