The Model Forgot Again — Maria Teresa Ramos

I've integrated AI into my day-to-day workflow. The models I work with are my thinking partners, technical counterparts, and research assistants. I open a chat window, enter a carefully structured prompt or a simple question, and work my way toward what I am trying to accomplish at the moment. This is pretty basic and straightforward as long as you keep it short. The context window has a limit on the number of tokens it can handle. Once you move beyond that, you are stuck in compaction messaging hell.

If your workflow requires back-and-forth with an agent for long periods, the picture is clear. In the middle of your conversation, ideas are flowing, and your agent decides it is the perfect moment to 'compact' the conversation. This compression step is part of how the technology is designed to work. The context window is finite, tokens are expensive, and compressing the conversation helps keep computational costs low by freeing up the system's attention from everything in the context window and making room for what's coming next. This series is meant as a design argument, not a technical proposal. The technical infrastructure to solve these issues either exists or is being built. But there is a gap at the product level, which can be addressed through design.

This is a cost-saving decision on the technology side that comes at a very high cost for the user experience. Every compaction cycle is a lossy operation; there's no guarantee the system is preserving the decisions and insights that matter most, or the rationale for the decisions made in the process. As the system squeezes the context window to make room for what might come next, it also makes unilateral decisions about what to keep during that compression. And in my experience, it sometimes forgets items we've already tackled, leading to repeated work and frustration.

From a UX perspective, this engineering paradigm undermines the most important factor in this technology's long-term success: trust. Imagine a finance team mid-way through a complex analysis session, or a product team three weeks into a roadmap build. The model compacts. A constraint established an hour ago is gone. A number. A methodology. A decision already made. No alert. No flag. No indication that anything changed because the transcript is treated as the source of truth. That's not an edge case. That's the architecture working exactly as designed.

I've been working on and around this for months. Numbered sessions. Handoff pastes. Markdown files cross-referenced mid-session. Hell, I built a harness to stress-test everything. They work, mostly. But workarounds are not solutions — and the fact that users arrived at the same ones independently says something. The problem is that consistent.

The problem has a name. So does the fix. Neither of them is memory.