| # Context Engineering π§ | |
| > Keeping long-running agents "forever young" by managing their memory. | |
| ## The Problem | |
| LLMs have finite context windows. As conversations grow, you eventually hit the token limit and the agent breaks. Simply truncating old messages loses valuable context. | |
| ## The Solution: Compactive Summarization | |
| Instead of truncating, we **summarize** old conversation history into a compact narrative, preserving the essential context while freeing up tokens. | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Before Compaction (500+ tokens) β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β [System] You are an HR assistant... β | |
| β [Human] Show me all candidates β | |
| β [AI] Here are 5 candidates: Alice, Bob... β | |
| β [Human] Tell me about Alice β | |
| β [AI] Alice is a senior engineer with 5 years... β | |
| β [Human] Schedule an interview with her β | |
| β [Tool] Calendar event created... β | |
| β [AI] Done! Interview scheduled for Monday. β | |
| β [Human] Now check Bob's CV β new β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β COMPACTION β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β After Compaction (~200 tokens) β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β [System] You are an HR assistant... β | |
| β [AI Summary] User reviewed candidates, focused on β | |
| β Alice (senior engineer), scheduled interview β | |
| β for Monday. β | |
| β [Human] Now check Bob's CV β kept β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| ## Architecture | |
| ``` | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β CompactingSupervisor β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β 1. Intercept agent execution β β | |
| β β 2. Run agent normally β β | |
| β β 3. Count tokens after response β β | |
| β β 4. If over limit β trigger compaction β β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β β | |
| β βΌ β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β HistoryManager β β | |
| β β β’ compact_messages() β LLM summarization β β | |
| β β β’ replace_thread_history() β checkpoint update β β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| ## π Subagents and Memory Safety | |
| Compaction affects **only the supervisorβs `messages` channel** inside LangGraphβs checkpoint. | |
| This includes: | |
| - User messages | |
| - Supervisor AI messages | |
| - **Tool call and Tool result messages** (because these are part of the supervisorβs visible conversation history) | |
| This does **not** include: | |
| - Sub-agent internal reasoning | |
| - Sub-agent private memory | |
| - Hidden chain-of-thought | |
| - Any messages stored in sub-agentβspecific channels | |
| Only the messages that the supervisor itself receives are ever compacted. | |
| No internal sub-agent state leaks into the compacted summary. | |
| ## Key Parameters | |
| | Parameter | Default | Description | | |
| |-----------|---------|-------------| | |
| | `token_limit` | 500 | Trigger compaction when exceeded | | |
| | `compaction_ratio` | 0.5 | Fraction of messages to summarize | | |
| ### Compaction Ratio Explained | |
| The `compaction_ratio` controls how aggressively we summarize: | |
| ``` | |
| compaction_ratio = 0.5 (Default) | |
| βββ Summarizes: oldest 50% of messages | |
| βββ Keeps verbatim: newest 50% of messages | |
| compaction_ratio = 0.8 (Aggressive) | |
| βββ Summarizes: oldest 80% of messages | |
| βββ Keeps verbatim: only newest 20% | |
| β Use when context is very tight | |
| compaction_ratio = 0.2 (Gentle) | |
| βββ Summarizes: only oldest 20% | |
| βββ Keeps verbatim: newest 80% | |
| β Use when you want more history preserved | |
| ``` | |
| **Example with 10 messages:** | |
| - `ratio=0.5` β Summarize messages 1-5, keep 6-10 verbatim | |
| - `ratio=0.8` β Summarize messages 1-8, keep 9-10 verbatim | |
| - `ratio=0.2` β Summarize messages 1-2, keep 3-10 verbatim | |
| ## Usage | |
| ```python | |
| from src.context_eng import compacting_supervisor | |
| # Just use it like a normal agent - compaction is automatic! | |
| response = compacting_supervisor.invoke( | |
| {"messages": [HumanMessage(content="Hello")]}, | |
| config={"configurable": {"thread_id": "my-thread"}} | |
| ) | |
| # Streaming works too | |
| for chunk in compacting_supervisor.stream(...): | |
| if chunk["type"] == "token": | |
| print(chunk["content"], end="") | |
| ``` | |
| ## LangGraph Integration | |
| ### How It Wraps the Agent | |
| The `CompactingSupervisor` uses the **Interceptor Pattern** - it wraps the existing LangGraph agent without modifying it: | |
| ```python | |
| # In compacting_supervisor.py | |
| from src.agents.supervisor.supervisor_v2 import supervisor_agent, memory | |
| compacting_supervisor = CompactingSupervisor( | |
| agent=supervisor_agent, # β Original LangGraph agent | |
| history_manager=HistoryManager(memory_saver=memory), # β LangGraph's MemorySaver | |
| ... | |
| ) | |
| ``` | |
| The agent itself is **unchanged**. We just intercept `invoke()` and `stream()` calls. | |
| ### How It Manipulates LangGraph Memory | |
| LangGraph uses **checkpoints** to persist conversation state. Normally, messages are append-only. Our `HistoryManager.replace_thread_history()` bypasses this to force a rewrite: | |
| ``` | |
| Normal LangGraph flow: | |
| βββββββββββββββββββββββββββββββββββββββ | |
| β Checkpoint Storage (MemorySaver) β | |
| β βββββββββββββββββββββββββββββββββ β | |
| β β messages: [m1, m2, m3, m4...] β β β Append-only | |
| β βββββββββββββββββββββββββββββββββ β | |
| βββββββββββββββββββββββββββββββββββββββ | |
| After compaction (we override): | |
| βββββββββββββββββββββββββββββββββββββββ | |
| β Checkpoint Storage (MemorySaver) β | |
| β βββββββββββββββββββββββββββββββββ β | |
| β β messages: [sys, summary, m4] β β β Force-replaced! | |
| β βββββββββββββββββββββββββββββββββ β | |
| βββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| **Key mechanism in `replace_thread_history()`:** | |
| 1. Get current checkpoint via `memory.get_tuple(config)` | |
| 2. Build new checkpoint with compacted messages | |
| 3. Increment version + update timestamps | |
| 4. Write directly via `memory.put(...)` - bypassing normal reducers | |
| This is a **low-level override** of LangGraph's internal checkpoint format. It works because we maintain the expected checkpoint structure (`channel_versions`, `channel_values`, etc.). | |
| ## Files | |
| | File | Purpose | | |
| |------|---------| | |
| | `token_counter.py` | Count tokens in message lists | | |
| | `history_manager.py` | Summarization + checkpoint manipulation | | |
| | `compacting_supervisor.py` | Agent wrapper (Interceptor Pattern) | | |