sysml-bench v0.1.0: Reproducible benchmark for SysML v2 model comprehension

Initial public release:
- 132 tasks across 13 categories (O1-O14)
- 5 models evaluated (Claude 3.5/3.7, GPT-4o, Gemini 2.0 Flash, DeepSeek-V3)
- Tool-augmented evaluation with sysml-cli (tree-sitter parser)
- pip install sysml-bench
- Container: registry.gitlab.com/nomograph/sysml-bench:v0.1.0
- HuggingFace: nomograph/sysml-v2-reasoning-benchmark