The Model Forgot Again

Part One of No Persistent State: A series on AI memory, trust, and the design argument nobody is making

How context windows work, and where they fail.
Why continuity is an illusion
The Constraint
The model is stateless. Every request is processed independently. There is no memory between sessions, no memory between turns. What feels like continuity is an illusion maintained by the application layer, which re-sends the conversation as input on each request. When the session ends, everything resets to zero. There is no persistent state anywhere in this architecture. Only what is explicitly re-injected each time.
How a single request moves through the model
๐Ÿ‘ค
User
Message input
Files, images
Instructions
assembled into
โฌœ
Context Window
All input tokens
combined into
one container
processed by
โš™
Model
Stateless inference
No memory between
requests
generates
โ†ฉ
Output
Response tokens
Added back to
context next turn
Everything shares one fixed-size container
Context Window
200,000 tokens (standard) ยท all content shares the same space
200K token limit
System prompt / instructions
~2K
Conversation history (older) โ† compacted
~40K
Conversation history (recent)
~50K
Uploaded files / documents
~60K
Tool outputs / search results
~20K
Current user message
~1K
Reserved for output โ†’
~4K
0 ~88% full 200K
Content Types
Everything the model sees lives in one shared space
System Prompt
Instructions set by the operator or product. Defines how the model behaves. Loads on every request.
~1โ€“4K
persistent
Conversation History
All prior messages โ€” user and assistant โ€” appended each turn. Grows continuously until compaction or session end.
grows
ephemeral
Uploaded Files
PDFs, documents, images converted to tokens. Large files can consume tens of thousands of tokens each.
varies
ephemeral
Tool Outputs
Results from web search, code execution, API calls. Accumulate in agentic workflows, consuming significant space.
varies
ephemeral
Current Message
The user's input this turn. Added last, processed with full context.
~0.5โ€“2K
ephemeral
Output Reserve
Space held back for the model's response. Input + output must fit within the window total.
~4โ€“8K
reserved
System prompt
Conversation history
Uploaded files
Tool outputs
Current message
Output reserve
Old context is summarized and discarded
Compaction
Triggered at ~83โ€“95% capacity ยท model continues from summary
Before โ€” ~90% full
current msg
tool outputs
recent history
older history
uploaded files
system prompt
90% consumed
model summarizes
older context
Compaction event
current msg
tool outputs
recent history
โ† summary block
system prompt
older history + files discarded
conversation
continues
After โ€” space recovered
current msg
summary (lossy)
system prompt
~25% consumed