Back to case study
Memory scheduling replay

Agent Long/Short-Term Memory System

Short-term SessionManager (truncation MAX_HISTORY=20 + rolling summary) + long-term MEMORY.md (flips to RAG past 2000 tokens), unified by a MemoryManager hub; production swaps in mem0 (LLM judge ADD/UPDATE/DELETE/NONE) + Milvus.

A replay of the MemoryManager hub: short-term messages compress at MAX_HISTORY=20, a candidate fact passes a three-trigger write gate, MEMORY.md flips Direct→RAG past 2000 tokens, and the mem0 LLM judge resolves a conflict.

mem0MilvusLangChainLlamaIndexMemory
Agent Long/Short-Term Memory System

Why this local version exists

All parameters (MAX_HISTORY=20, 2000-token threshold, the three write-triggers, the four mem0 ops) come from the Part 8 courseware (mini-OpenClaw + mem0). No live LLM/Milvus runs in the browser.

Interactive Preview

Short-term + long-term memory scheduling

A replay of the MemoryManager hub: short-term compression (MAX_HISTORY=20) → three-trigger write gate → MEMORY.md flips to RAG past 2000 tokens → mem0 LLM judge.

Short-term SessionManager

0/20

deepseek-chat · 128K · MAX_HISTORY=20

Write to long-term? Three-trigger gate

Candidate: User prefers TypeScript, avoids Python long-term

factualitystabilitycross-session reuse

Long-term MEMORY.md

Direct
900 tokensthreshold 2000

Under threshold → whole MEMORY.md injected into the system prompt (MD5-cached).

mem0 LLM judge

In production, swap to mem0: an LLM judge picks one of ADD/UPDATE/DELETE/NONE.

What to try

Run the flow and watch short-term hit MAX_HISTORY=20, then fold the front 50% into a summary.

See the candidate fact pass the factuality / stability / cross-session gate before it writes long-term.

Watch MEMORY.md cross 2000 tokens and flip Direct→RAG, then the mem0 judge pick UPDATE.

What this demo proves

You treat memory as a system: short-term (truncate/compress) + long-term (Direct/RAG) + a scheduling hub.

You gate writes with explicit criteria, instead of dumping everything into a vector store.

You can go from a hand-rolled version to production mem0 and reason about its namespace / judge / backend trade-offs.

Short-term

SessionManager · MAX_HISTORY=20 · rolling summary of the front 50%

Long-term

MEMORY.md, Direct→RAG at 2000 tokens (LlamaIndex VectorStoreIndex)

Production

mem0 LLM judge (ADD/UPDATE/DELETE/NONE) · Milvus · LangChain @tool