Context Window
Published
Also known as: context length, context size, token limit
An AI model's short-term working memory — the maximum amount of text it can hold and process at once before older content is dropped.
Source: Common AI/ML terminology
Primary reference ↗The context window defines how much text an AI language model can “see” at any one moment. It encompasses the full conversation history, system instructions, retrieved documents, and any other input — all counted in tokens.
Why It Matters
When the context window fills up, the model can no longer access the earliest content. For a multi-agent research system querying large biological databases, context limits directly constrain how much data can be passed to the model for reasoning at once.
Context Window Growth Over Time
| Model | Context Window |
|---|---|
| GPT-3.5-turbo-0125 | 16,385 tokens (~12,000 words) |
| GPT-4o | 128,000 tokens (~96,000 words) |
| GPT-5 / Claude | 400,000+ tokens (~300,000 words) |
Practical Implications
- Larger context windows reduce hallucinations caused by missing information
- They allow more database results to be passed to the model at once
- Even with large windows, token costs scale linearly — context management remains important for cost optimization
- Retrieval-Augmented Generation (RAG) was developed partly to work around small context windows; larger windows reduce (but don’t eliminate) the need for selective retrieval