How context windows work, and where they fail.
01 โ The Fundamental Constraint
Why continuity is an illusion
The Constraint
The model is stateless. Every request is processed independently. There is no memory between sessions, no memory between turns. What feels like continuity is an illusion maintained by the application layer, which re-sends the conversation as input on each request. When the session ends, everything resets to zero. There is no persistent state anywhere in this architecture. Only what is explicitly re-injected each time.
02 โ Request / Response Architecture
How a single request moves through the model
User
Message input
Files, images
Instructions
Files, images
Instructions
assembled into
Context Window
All input tokens
combined into
one container
combined into
one container
processed by
Model
Stateless inference
No memory between
requests
No memory between
requests
generates
Output
Response tokens
Added back to
context next turn
Added back to
context next turn
03 โ What Lives Inside the Context Window
Everything shares one fixed-size container
Context Window
200,000 tokens (standard) ยท all content shares the same space
200K token limit
System prompt / instructions
~2K
Conversation history (older) โ compacted
~40K
Conversation history (recent)
~50K
Uploaded files / documents
~60K
Tool outputs / search results
~20K
Current user message
~1K
Reserved for output โ
~4K
0
~88% full
200K
Content Types
Everything the model sees lives in one shared space
System Prompt
Instructions set by the operator or product. Defines how the model behaves. Loads on every request.
~1โ4K
persistent
Conversation History
All prior messages โ user and assistant โ appended each turn. Grows continuously until compaction or session end.
grows
ephemeral
Uploaded Files
PDFs, documents, images converted to tokens. Large files can consume tens of thousands of tokens each.
varies
ephemeral
Tool Outputs
Results from web search, code execution, API calls. Accumulate in agentic workflows, consuming significant space.
varies
ephemeral
Current Message
The user's input this turn. Added last, processed with full context.
~0.5โ2K
ephemeral
Output Reserve
Space held back for the model's response. Input + output must fit within the window total.
~4โ8K
reserved
System prompt
Conversation history
Uploaded files
Tool outputs
Current message
Output reserve
04 โ What Happens When the Window Fills
Old context is summarized and discarded
Compaction
Triggered at ~83โ95% capacity ยท model continues from summary
Before โ ~90% full
90% consumed
model summarizes
older context
older context
Compaction event
older history + files discarded
conversation
continues
continues
After โ space recovered
~25% consumed