feat(ci): add v2 code-graph benchmarks and extract integration-tests-codegraph crate

Summary

  • Add v2 code-graph benchmarks to CI with Python (Django, Flask), C# (NLog), and Kotlin (OkHttp) repos
  • Extract integration-tests-codegraph crate from integration-testkit to isolate heavy deps (code-graph, lance-graph, datafusion, tree-sitter grammars) from the containers and cli test binaries
  • Clean up the extracted crate: remove dead exports, unused fields, simplify Arrow batch construction, use tabled for failure output

Changes

V2 benchmarks

  • New index-v2 scenario in code-indexing-benchmark.yaml with --v2 flag
  • Python repos: Django 5.2, Flask 3.1.1
  • C# repo: NLog v5.4.0
  • OkHttp added to kotlin group
  • New codegraph-test CI job runs v2 benchmarks for [python, java, kotlin, csharp]
  • Benchmark template parameterized with SCENARIO variable

Crate extraction

  • integration-tests-codegraph owns graph_validator source, fixtures, and YAML test suites
  • integration-testkit loses 7 heavy deps (code-graph, lance-graph, arrow_56, etc.)
  • run-unit-tests.sh excludes the new crate; separate codegraph-test CI job runs it
  • mise test:integration:codegraph replaces mise test:graph-validator

Code cleanup

  • Dead code removed: run_yaml_suite_file, unused pub re-exports, description/params fields
  • CountEqualsArgs/CountGteArgs unified into FieldValueArgs
  • check_int_field helper deduplicates assertion logic
  • make_batch helper reduces Arrow boilerplate in datasets.rs
  • tabled replaces manual box-drawing for failure output
  • All internal types narrowed to pub(crate)

Merge request reports

Loading