Document LangGraph as chosen orchestration framework and rationale for not replacing it
## Context As checkpoint scalability work progresses, the question of replacing LangGraph as the orchestration framework may arise. This issue documents the evaluation of alternatives and the rationale for keeping LangGraph. ## Evaluation summary LangGraph is currently the only framework that satisfies all four hard requirements simultaneously: | Requirement | LangGraph | Temporal | Prefect | Claude Agent SDK | Haystack | |---|---|---|---|---|---| | Graph-based conditional routing | ✓ | △ (code-only) | ✓ | △ (sessions) | △ (pipeline) | | Custom checkpoint backends | ✓ | N/A | N/A | △ | △ | | Native LLM token streaming | ✓ | ✗ | ✗ | ✓ | ✓ | | Human-in-the-loop interrupts | ✓ | ✓ | ✓ | △ | △ | | SM data residency compatible | ✓ (custom saver) | ✗ (infra required) | ✗ | △ | △ | ### Why each alternative was ruled out **Temporal**: Best durability model (event sourcing / replay), but has **no native streaming primitive**. The DWS gRPC stream to the IDE delivers tokens in real time via `astream()` with `values`, `messages`, and `updates` modes simultaneously. Replicating this with Temporal requires a side-channel (Kafka, Redis Streams) — significant extra infrastructure for what LangGraph provides out of the box. **Prefect / Airflow**: General-purpose task orchestration, not AI-native. No LLM streaming support. Verbose for agent patterns. Adds operational overhead without addressing the streaming and routing requirements. **Claude Agent SDK**: Native streaming and simple tool use, but lacks the graph topology that the `Router`/`Component` system in `agent_platform/v1` depends on. Migrating would require rebuilding conditional routing from scratch. **Haystack**: AI-native with streaming support, but weaker multi-agent routing expressiveness. Community smaller than LangGraph. Less mature checkpoint/persistence story. ### Migration cost The `GitLabWorkflow` checkpointer is ~930 lines of code with deep integration into Rails APIs, billing events, internal tracking, and status state machines. LangGraph's `BaseCheckpointSaver` interface is what makes this possible. Any replacement framework would require reimplementing equivalent persistence integration from scratch. ## Recommended action Keep LangGraph. Invest in improving how the checkpoint interface is used (incremental writes, async writes, tracing decoupling) rather than migrating the framework. This issue should result in a documented Architecture Decision Record (ADR) added to the `ai-assist` repository.
issue