feat(indexer): metrics cleanup in dispatcher, sdlc and code indexing

What does this MR do and why?

Cleans up indexer metrics across the dispatcher, SDLC, and code indexing modules.

Dispatcher metrics: Adds a query label to indexer.dispatch.query.duration so we can distinguish between different dispatch queries (e.g. pending_projects vs enabled_namespaces).

SDLC metrics:

  • Removes indexer.sdlc.pipeline.batches.processed — batch count depends on ClickHouse page sizes, not workload, so it's not useful for monitoring.
  • Adds indexer.sdlc.datalake.query.bytes to track data volume returned by datalake queries, which tells us more about ClickHouse read pressure than row counts alone.
  • Renames record_datalake_query_duration to record_datalake_query since it now records both duration and bytes.

Code indexing metrics:

  • Splits the combined fetch+extract timing into two separate metrics: repository.fetch.duration (Gitaly download) and repository.extract.duration (tar unpacking), so we can tell whether slowness is network or disk I/O.
  • Removes indexer.code.write.duration — it timed the full write_graph_data call but didn't add value beyond the existing handler duration metric.
  • Moves archive extraction out of gitaly-client into a new archive module in the indexer crate, with path traversal protection for tar entries and symlinks.

GitalyClient::pull_and_extract_repository is kept for backward compatibility, but the code indexing pipeline now calls fetch_archive + archive::unpack_archive separately.

Observability docs updated to match.

Testing

Unit and integration tests

Performance Analysis

  • This merge request does not introduce any performance regression. If a performance regression is expected, explain why.

Merge request reports

Loading