Question 1

How is LLM API pricing actually charged?

Accepted Answer

Almost every large language model API bills per token, split into input tokens (your prompt, system message, and any context you send) and output tokens (what the model generates). Output is usually 2 to 6 times more expensive than input, so a chatty model that writes long answers can cost far more than its input price suggests. A token is roughly 4 characters of English, so about 750 words is 1,000 tokens.

Question 2

Which LLM API is the cheapest in 2026?

Accepted Answer

On raw list price, open-weight models like DeepSeek V4 Flash are the cheapest by a wide margin, followed by budget tiers like Grok 4.1 Fast, Gemini 3.1 Flash-Lite, and GPT-5 mini. The frontier models (Claude Opus 4.8, GPT-5.5, Gemini 3.1 Pro, Claude Fable 5) cost more but reason far better, so the real question is cost per solved task, not cost per token. Use the estimator above with your own token volume.

Question 3

Why are input and output prices different?

Accepted Answer

Generating output is more compute-intensive than reading input, so providers price the two separately, with output costing more. This is why prompt design matters: a long system prompt is cheap input, but asking the model to repeat or pad its answer burns the expensive output side. For repeated context, prompt caching (offered by Anthropic, OpenAI, Google, and others) can cut the input cost by up to 90%.

Question 4

Do these prices include caching or batch discounts?

Accepted Answer

No. The table shows each model's standard list price per million tokens. Most providers offer a roughly 50% discount for asynchronous batch jobs and a large discount (often 90% or more) on cached input tokens for repeated context. If your workload reuses a big fixed prompt, your effective cost can be well below these numbers. Always confirm current pricing on the provider's own page.

Question 5

How do I put this comparison on my own site?

Accepted Answer

Copy the embed snippet near the bottom of the page and paste it into any blog post or docs page. It drops in a clean, chrome-free version of this table and estimator with a small credit link back to AI Tools Insider. It is free to use.

Provider	Model	Input $/1M	Output $/1M	Context	Your cost ▼ /mo
Anthropic	Claude Haiku 4.5Fastest, cheapest Claude.	$1.00	$5.00	200K	—
Anthropic	Claude Sonnet 4.6Best speed/intelligence balance.	$3.00	$15.00	1M	—
Anthropic	Claude Opus 4.8Flagship Opus; 1M context at standard price.	$5.00	$25.00	1M	—
Anthropic	Claude Fable 5Most capable Claude; thinking always on.	$10.00	$50.00	1M	—
OpenAI	GPT-5 miniLow-cost workhorse.	$0.25	$2.00	400K	—
OpenAI	GPT-5.4 miniCheaper mini tier.	$0.75	$4.50	400K	—
OpenAI	GPT-5Original GPT-5.	$1.25	$10.00	400K	—
OpenAI	GPT-5.4Previous flagship.	$2.50	$15.00	400K	—
OpenAI	GPT-5.5Current flagship; cached input ~90% off.	$5.00	$30.00	400K	—
Google	Gemini 3.1 Flash-LiteCheapest Gemini.	$0.25	$1.50	1M	—
Google	Gemini 3 Flash (Preview)Preview tier.	$0.50	$3.00	1M	—
Google	Gemini 3.5 FlashCached input ~$0.15/M.	$1.50	$9.00	1M	—
Google	Gemini 3.1 ProTiered: 2x in / 1.5x out above 200K tokens.	$2.00	$12.00	1M	—
xAI	Grok 4.1 FastBudget tier; cached input ~$0.20/M.	$0.20	$0.50	1M	—
xAI	Grok 4.3Very low output price.	$1.25	$2.50	1M	—
DeepSeek	DeepSeek V4 FlashCheapest here; cache hits ~98% off.	$0.14	$0.28	128K	—
DeepSeek	DeepSeek V4 ProOpen-weight; aggressive caching.	$0.43	$0.87	128K	—

LLM API Pricing Comparison (2026)

The fast answer

Estimate your monthly API bill

Put this comparison on your own site

Questions about LLM API pricing

How is LLM API pricing actually charged?

Which LLM API is the cheapest in 2026?

Why are input and output prices different?

Do these prices include caching or batch discounts?

How do I put this comparison on my own site?

Picking between Claude, GPT and Gemini?