| Model | Provider | Context window |
|---|---|---|
| Gemini 2.5 Pro | ~1M tokens | |
| Gemini 2.5 Flash | ~1M tokens | |
| Gemini 2.0 Flash | ~1M tokens | |
| GPT-4.1 | OpenAI | ~1M tokens |
| GPT-4.1 mini | OpenAI | ~1M tokens |
| GPT-4.1 nano | OpenAI | ~1M tokens |
| o3 | OpenAI | 200,000 tokens |
| o4-mini | OpenAI | 200,000 tokens |
| Claude Opus 4 | Anthropic | 200,000 tokens |
| Claude Sonnet 4 | Anthropic | 200,000 tokens |
| Claude Haiku 4.5 | Anthropic | 200,000 tokens |
| Llama 3.3 70B | Meta | 131,072 tokens |
| Llama 3.1 405B | Meta | 131,072 tokens |
| Llama 3.1 8B | Meta | 131,072 tokens |
| GPT-4o | OpenAI | 128,000 tokens |
| GPT-4o mini | OpenAI | 128,000 tokens |
| GPT-3.5 Turbo | OpenAI | 16,385 tokens |
About this tool
A model's context window is the maximum number of tokens it can consider at once — prompt plus completion combined. Exceed it and the request is rejected or silently truncated, dropping the start of your prompt. This checker counts your text against the selected model's limit and shows the result as a progress bar with tokens used, the percentage of the window consumed, and tokens remaining for the response.
Context windows range from a few thousand tokens on older models to over a million on the latest. Remember the window is shared: every token spent on the prompt is one fewer available for the answer, so leave headroom for the output you expect. OpenAI counts are exact via tiktoken; other providers are character-based estimates.
The window is shared between input and output — budget room for the response, not just the prompt.
Frequently asked questions
What is a context window?
It is the total number of tokens a model can hold in working memory for a single request, covering both your prompt and the generated response. Think of it as the model's short-term attention span, measured in tokens.
What happens if I exceed it?
Depending on the API, the request either errors out or is truncated — usually the oldest tokens are dropped — which can quietly remove instructions at the start of your prompt. It is best to stay comfortably under the limit.
Do the prompt and the response share the window?
Yes. If a model has a 128K window and your prompt is 120K tokens, only about 8K remain for the answer. Always reserve space for the expected output.
Which models have the largest context windows?
Several recent models from OpenAI and Google offer roughly one million tokens, while many production models sit at 128K–200K. The reference table in this tool lists the limit for each model.