Implement conversation compaction using summarization

Problem to solve

As a Duo Workflow user, I want important context preserved when conversation history is trimmed, so that the LLM can make better decisions based on previous interactions.

Currently, when conversation history exceeds the context budget, old messages are dropped entirely. This loses important context like previous decisions, encountered errors, and task progress.

Proposal

Generate an LLM summary of old messages before dropping them.

  • Before trimming, identify messages to be removed
  • Generate concise summary capturing key information (decisions, errors, progress)
  • Replace old messages with summary message
  • Keep recent messages intact

Further details

Current behavior: Old messages dropped, context lost permanently

Expected behavior: Old messages summarized, key context preserved

Dependencies: Benefits from #1861 (closed) (lazy trimming) and #1862 (accurate token counting)

  • duo_workflow_service/conversation/trimmer.py
  • duo_workflow_service/entities/state.py (_conversation_history_reducer)
Edited by Junming Huang