OpenReplay Logo
12k
12k

LLM API cost calculator

Estimate and compare LLM API costs across GPT, Claude, Gemini and Llama. Paste a prompt to auto-count tokens or type the numbers, then see the cost per call and at scale.

Processed locally

Cost for GPT-4o

$0.00
Input cost / call
$0.00
Output cost / call
$0.00
Total / call
$0.00
Total / 1 request
$0.00
Per 1K requests
$0.00
Per 1M requests

Compare all models

Prices updated June 2026
Model Input $/1M Output $/1M Cost / call Cost / N req

List prices in USD per 1 million tokens. Providers change pricing and ship new models often, so treat these as a starting point and confirm against the provider's pricing page before budgeting. Token counts are exact for OpenAI models (tiktoken) and approximate for other providers, which have no official public client-side tokenizer.

About this tool

Every LLM API charges per token, with separate input (prompt) and output (completion) rates, so the cost of a feature depends on how many tokens flow in each direction and how often you call the model. This calculator turns those numbers into a per-call cost and projects them across thousands or millions of requests, then lays every model side by side so you can see what the same workload costs on each.

Enter input and output tokens directly, or paste your prompt to count its tokens with the shared tokenizer — exact for OpenAI, estimated for other providers. Prices are list rates per million tokens, and the date they were last updated is shown above the comparison table; because providers change pricing and ship new models often, treat the totals as a well-informed estimate and confirm against the provider before committing a budget.

List prices may exclude volume discounts, cached-input pricing and batch tiers. Everything is computed locally in your browser.

Frequently asked questions

How is the cost calculated?

Input tokens times the model's input price plus output tokens times its output price, each prorated from the per-million-token list rate. The per-call total is the sum, and the projections simply multiply that by your request volume.

Why are input and output priced differently?

Generating output is more compute-intensive than reading input, so most providers charge more — often several times more — per output token than per input token. That is why a model that returns long answers can cost far more than its input price alone suggests.

Are these prices current?

They reflect published list prices as of the date shown above the table, stored in one place we update periodically. Providers adjust pricing and release new models frequently, and discounts, cached-input rates and batch tiers are not included, so verify against the provider before relying on a figure.

What about cached or batch pricing?

Many providers offer cheaper rates for cached prompt prefixes or asynchronous batch jobs. This calculator uses standard real-time rates, so if you use those tiers your actual cost will be lower than shown.