Context-budget replay

Context Engineering Middleware

Fighting Context Rot: six context modules × five strategies (Write/Select/Compress/Isolate/Cache), all realized as stackable LangChain middleware, applied Cache-first (90% off) then Isolate.

Six modules fill the context window; stack the five strategies in the course's "Cache-first, Isolate-later" priority and watch both window tokens and relative cost drop.

Context EngineeringLangChainMiddlewarePrompt Cache

Case Study

Why this local version exists

The decision priority, the five strategies (Write/Select/Compress/Isolate/Cache), the Compress sub-techniques, and the 90%-off prompt cache come from the Part 9 courseware. The token/cost numbers are illustrative.

Interactive Preview

Five strategies trimming the context window

Six modules fill the window; stack the five strategies in the "Cache-first, Isolate-later" priority order — window and cost both drop.

Six context modules

System prompt

Conversation history

Memory injection

Tool context

Task state

External knowledge

Conversation history + tool context together are >50% of the window (the two biggest taps).

Window

180K

Relative cost

100%

① Cache

Static prefix cache hit, cache-read at 10% (90% cost off), day one

② Compress · tool-result clearing

Drop verbose raw tool output, keep "decision + why" — zero cost

③ Compress · observation masking + trim

Mask old observations + trim_messages hard-truncate history

④ Isolate · sub-agent

SubAgentMiddleware moves external-knowledge work into its own context

⑤ Write + Select

Offload task state to a scratchpad, JIT-retrieve memory

What to try

Stack the strategies and watch the window shrink module by module.

Note Cache goes first (90% off, day one) and the system-prompt prefix turns "cached".

See Isolate spin external-knowledge work into a sub-agent context only when needed.

What this demo proves

You understand Context Rot — bigger window ≠ better — and manage it deliberately.

You can name each module and pick the right strategy, realized as stackable LangChain middleware.

You order savings: zero-cost (Cache / tool-result clearing) first, complex (Isolate / Write) on demand.

Framework

Six modules × five strategies (Write/Select/Compress/Isolate/Cache)

As middleware

trim_messages · SummarizationMiddleware · SubAgentMiddleware (deepagents)

Cost lever

Prompt cache: read at 10% (90% off), prefix byte-stable

Back to case study