Context window checker

Model

Reserve for response (expected output tokens)

Your prompt Estimate

Processed locally

0 / 128,000 tokens

0% of window used

Tokens used

Reserved for response

128,000

Tokens remaining

Context window reference Updated June 2026

Model	Provider	Context window
Gemini 2.5 Pro	Google	~1M tokens
Gemini 2.5 Flash	Google	~1M tokens
Gemini 2.0 Flash	Google	~1M tokens
GPT-4.1	OpenAI	~1M tokens
GPT-4.1 mini	OpenAI	~1M tokens
GPT-4.1 nano	OpenAI	~1M tokens
o3	OpenAI	200,000 tokens
o4-mini	OpenAI	200,000 tokens
Claude Opus 4	Anthropic	200,000 tokens
Claude Sonnet 4	Anthropic	200,000 tokens
Claude Haiku 4.5	Anthropic	200,000 tokens
Llama 3.3 70B	Meta	131,072 tokens
Llama 3.1 405B	Meta	131,072 tokens
Llama 3.1 8B	Meta	131,072 tokens
GPT-4o	OpenAI	128,000 tokens
GPT-4o mini	OpenAI	128,000 tokens
GPT-3.5 Turbo	OpenAI	16,385 tokens

About this tool

A model's context window is the maximum number of tokens it can consider at once — prompt plus completion combined. Exceed it and the request is rejected or silently truncated, dropping the start of your prompt. This checker counts your text against the selected model's limit and shows the result as a progress bar with tokens used, the percentage of the window consumed, and tokens remaining for the response.

Context windows range from a few thousand tokens on older models to over a million on the latest. Remember the window is shared: every token spent on the prompt is one fewer available for the answer, so leave headroom for the output you expect. OpenAI counts are exact via tiktoken; other providers are character-based estimates.

The window is shared between input and output — budget room for the response, not just the prompt.

Frequently asked questions

What is a context window?

It is the total number of tokens a model can hold in working memory for a single request, covering both your prompt and the generated response. Think of it as the model's short-term attention span, measured in tokens.

What happens if I exceed it?

Depending on the API, the request either errors out or is truncated — usually the oldest tokens are dropped — which can quietly remove instructions at the start of your prompt. It is best to stay comfortably under the limit.

Do the prompt and the response share the window?

Yes. If a model has a 128K window and your prompt is 120K tokens, only about 8K remain for the answer. Always reserve space for the expected output.

Which models have the largest context windows?

Several recent models from OpenAI and Google offer roughly one million tokens, while many production models sit at 128K–200K. The reference table in this tool lists the limit for each model.

About this tool

Frequently asked questions

Related tools