The Model Forgot Again — Maria Teresa Ramos

How context windows work, and where they fail.

01 — The Fundamental Constraint

Why continuity is an illusion

The Constraint

The model is stateless. Every request is processed independently. There is no memory between sessions, no memory between turns. What feels like continuity is an illusion maintained by the application layer, which re-sends the conversation as input on each request. When the session ends, everything resets to zero. There is no persistent state anywhere in this architecture. Only what is explicitly re-injected each time.

02 — Request / Response Architecture

How a single request moves through the model

👤

User

Message input
Files, images
Instructions

assembled into

⬜

Context Window

All input tokens
combined into
one container

processed by

⚙

Model

Stateless inference
No memory between
requests

generates

↩

Output

Response tokens
Added back to
context next turn

03 — What Lives Inside the Context Window

Everything shares one fixed-size container

Context Window

200,000 tokens (standard) · all content shares the same space

200K token limit

System prompt / instructions

~2K

Conversation history (older) ← compacted

~40K

Conversation history (recent)

~50K

Uploaded files / documents

~60K

Tool outputs / search results

~20K

Current user message

~1K

Reserved for output →

~4K

0 ~88% full 200K

Content Types

Everything the model sees lives in one shared space

System Prompt

Instructions set by the operator or product. Defines how the model behaves. Loads on every request.

~1–4K

persistent

Conversation History

All prior messages — user and assistant — appended each turn. Grows continuously until compaction or session end.

grows

ephemeral

Uploaded Files

PDFs, documents, images converted to tokens. Large files can consume tens of thousands of tokens each.

varies

ephemeral

Tool Outputs

Results from web search, code execution, API calls. Accumulate in agentic workflows, consuming significant space.

varies

ephemeral

Current Message

The user's input this turn. Added last, processed with full context.

~0.5–2K

ephemeral

Output Reserve

Space held back for the model's response. Input + output must fit within the window total.

~4–8K

reserved

System prompt

Conversation history

Uploaded files

Tool outputs

Current message

Output reserve

04 — What Happens When the Window Fills

Old context is summarized and discarded

Compaction

Triggered at ~83–95% capacity · model continues from summary

Before — ~90% full

current msg

tool outputs

recent history

older history

uploaded files

system prompt

90% consumed

model summarizes
older context

Compaction event

current msg

tool outputs

recent history

← summary block

system prompt

older history + files discarded

conversation
continues

After — space recovered

current msg

summary (lossy)

system prompt

~25% consumed