LLM API Pricing Comparison (2026)

The per-million-token API price of every major large language model in one place: Claude, GPT-5, Gemini, Grok and DeepSeek. Type your monthly token volume below and the table ranks every model by your actual estimated bill. Nothing is uploaded or saved.

The fast answer

  • Output tokens cost more than input on every model here (often 2 to 6 times), so the model that writes long answers can be the expensive one even with a low headline price.
  • Cheapest on raw list price: open-weight models like DeepSeek, then budget tiers (Grok 4.1 Fast, Gemini Flash-Lite, GPT-5 mini, Claude Haiku 4.5).
  • Frontier models cost more but reason better, so compare cost per solved task, not just cost per token. Enter your tokens below to see your real number.

Estimate your monthly API bill

A token is roughly 4 characters (about 750 words per 1,000 tokens). The "Your cost" column and ranking update as you type. Estimate only; caching and batch discounts can lower it.

Provider Model Input
$/1M
Output
$/1M
Context Your cost
/mo
Anthropic Claude Haiku 4.5Fastest, cheapest Claude. $1.00 $5.00 200K
Anthropic Claude Sonnet 4.6Best speed/intelligence balance. $3.00 $15.00 1M
Anthropic Claude Opus 4.8Flagship Opus; 1M context at standard price. $5.00 $25.00 1M
Anthropic Claude Fable 5Most capable Claude; thinking always on. $10.00 $50.00 1M
OpenAI GPT-5 miniLow-cost workhorse. $0.25 $2.00 400K
OpenAI GPT-5.4 miniCheaper mini tier. $0.75 $4.50 400K
OpenAI GPT-5Original GPT-5. $1.25 $10.00 400K
OpenAI GPT-5.4Previous flagship. $2.50 $15.00 400K
OpenAI GPT-5.5Current flagship; cached input ~90% off. $5.00 $30.00 400K
Google Gemini 3.1 Flash-LiteCheapest Gemini. $0.25 $1.50 1M
Google Gemini 3 Flash (Preview)Preview tier. $0.50 $3.00 1M
Google Gemini 3.5 FlashCached input ~$0.15/M. $1.50 $9.00 1M
Google Gemini 3.1 ProTiered: 2x in / 1.5x out above 200K tokens. $2.00 $12.00 1M
xAI Grok 4.1 FastBudget tier; cached input ~$0.20/M. $0.20 $0.50 1M
xAI Grok 4.3Very low output price. $1.25 $2.50 1M
DeepSeek DeepSeek V4 FlashCheapest here; cache hits ~98% off. $0.14 $0.28 128K
DeepSeek DeepSeek V4 ProOpen-weight; aggressive caching. $0.43 $0.87 128K

Standard list prices per 1,000,000 tokens, verified June 2026. Some models have tiered pricing (e.g. Gemini Pro charges more above 200K tokens) and most offer batch (~50% off) and cached-input (often 90%+ off) discounts not reflected here. Always confirm current pricing on the provider's own page.

Cite this table: AI Tools Insider, "LLM API Pricing Comparison (2026)," https://aitoolsinsiderhq.com/llm-api-pricing.html. Data CC BY 4.0 — reuse it with a link back.

Put this comparison on your own site

Free to embed. Paste this snippet into any blog post or docs page and the table plus estimator drops right in.

<iframe src="https://aitoolsinsiderhq.com/llm-api-pricing.html?embed=1" width="100%" height="900" style="border:1px solid #e8e1d2;border-radius:14px;max-width:760px" loading="lazy" title="LLM API Pricing Comparison by AI Tools Insider"></iframe>

Questions about LLM API pricing

How is LLM API pricing actually charged?

Almost every large language model API bills per token, split into input tokens (your prompt, system message, and any context you send) and output tokens (what the model generates). Output is usually 2 to 6 times more expensive than input, so a chatty model that writes long answers can cost far more than its input price suggests. A token is roughly 4 characters of English, so about 750 words is 1,000 tokens.

Which LLM API is the cheapest in 2026?

On raw list price, open-weight models like DeepSeek V4 Flash are the cheapest by a wide margin, followed by budget tiers like Grok 4.1 Fast, Gemini 3.1 Flash-Lite, and GPT-5 mini. The frontier models (Claude Opus 4.8, GPT-5.5, Gemini 3.1 Pro, Claude Fable 5) cost more but reason far better, so the real question is cost per solved task, not cost per token. Use the estimator above with your own token volume.

Why are input and output prices different?

Generating output is more compute-intensive than reading input, so providers price the two separately, with output costing more. This is why prompt design matters: a long system prompt is cheap input, but asking the model to repeat or pad its answer burns the expensive output side. For repeated context, prompt caching (offered by Anthropic, OpenAI, Google, and others) can cut the input cost by up to 90%.

Do these prices include caching or batch discounts?

No. The table shows each model's standard list price per million tokens. Most providers offer a roughly 50% discount for asynchronous batch jobs and a large discount (often 90% or more) on cached input tokens for repeated context. If your workload reuses a big fixed prompt, your effective cost can be well below these numbers. Always confirm current pricing on the provider's own page.

How do I put this comparison on my own site?

Copy the embed snippet near the bottom of the page and paste it into any blog post or docs page. It drops in a clean, chrome-free version of this table and estimator with a small credit link back to AI Tools Insider. It is free to use.

Picking between Claude, GPT and Gemini?

Price is only half the call. Our head-to-head breaks down which model wins on reasoning, coding, speed and real-world cost.

ChatGPT vs Claude vs Gemini →  ·  AI tool pricing report  ·  Compare all tools

Powered by AI Tools Insider · LLM API Pricing Comparison