feat: add benchmark support for code and agentic capabilities

  • Add HumanEval and BFCL benchmarks to ModelSpec interface
  • Populate benchmark scores with verified data from online sources
  • Add collapsible benchmark explanations section to reference docs
  • Make all reference documentation sections collapsible
  • Add sortable benchmark columns to model specifications table
  • Add benchmark selector dropdown to performance chart
  • Filter legacy models from charts (only show current agentic models)
  • Display models without scores in separate section below chart

Benchmark data sources documented in lib/data.ts

Merge request reports

Loading