Developer Tools

LLM API Cost Calculator

Estimate and compare API pricing for GPT-4o, Claude, Gemini, Llama, DeepSeek, Mistral, and Grok. Editable rates, monthly and yearly cost.

Per-request usage

Enter the tokens you expect to send and receive on a typical request. The calculator multiplies by the request volume on the right.

Input tokens

Prompt plus system message tokens you send.

Output tokens

Tokens the model returns. Used to bill at the higher output rate.

Cached input tokens (optional)

Tokens read from a prompt cache. Models without a cached price bill these as normal input.

Requests per month

Roughly 333 per day, 120,000 per year.

Cheapest pick

Meta (via Groq)

Llama 3.1 8B

Small open model. 128K context.

Per request: $0.000090
Per day: $0.0300
Per month: $0.9000
Per year: $10.80

Versus the most expensive option

Anthropic Claude Opus 4 would cost $525.00 per month, about 583 x more than the cheapest pick.

Per-model comparison

List prices in USD per million tokens. Every cell is editable. Click a row to toggle it on or off in the totals.

ProviderSort

Model	Cached $/1M	Context	Per request	Per month	Per year
GPT-4o OpenAI Flagship multimodal model. 128K context.		128,000 Usage 1%	$0.007500	$75.00	$900.00
GPT-4o mini OpenAI Small, fast, low-cost. 128K context.		128,000 Usage 1%	$0.000450	$4.50	$54.00
GPT-4.1 OpenAI Long-context reasoning. 1M context.		1,000,000 Usage 0.1%	$0.006000	$60.00	$720.00
GPT-4.1 mini OpenAI Smaller GPT-4.1 variant. 1M context.		1,000,000 Usage 0.1%	$0.001200	$12.00	$144.00
o3 OpenAI High-end reasoning model. 200K context.		200,000 Usage 0.8%	$0.006000	$60.00	$720.00
o4-mini OpenAI Affordable reasoning model. 200K context.		200,000 Usage 0.8%	$0.003300	$33.00	$396.00
GPT-3.5 Turbo OpenAI Legacy chat model. 16K context.	n/a	16,385 Usage 9%	$0.001250	$12.50	$150.00
Claude Opus 4 Anthropic Top-tier reasoning. 200K context.		200,000 Usage 0.8%	$0.0525	$525.00	$6,300.00
Claude Sonnet 4.5 Anthropic Balanced flagship. 200K context.		200,000 Usage 0.8%	$0.0105	$105.00	$1,260.00
Claude Haiku 4.5 Anthropic Fastest Claude. 200K context.		200,000 Usage 0.8%	$0.003500	$35.00	$420.00
Claude 3.5 Haiku Anthropic Previous-generation fast model. 200K context.		200,000 Usage 0.8%	$0.002800	$28.00	$336.00
Gemini 2.5 Pro Google Flagship Google model. 2M context.		2,000,000 Usage 0.07%	$0.006250	$62.50	$750.00
Gemini 2.5 Flash Google Fast Google model. 1M context.		1,000,000 Usage 0.1%	$0.001550	$15.50	$186.00
Gemini 2.5 Flash Lite Google Cheapest Gemini. 1M context.		1,000,000 Usage 0.1%	$0.000300	$3.00	$36.00
Llama 3.3 70B Meta (via Groq) Open-weights flagship. 128K context.	n/a	128,000 Usage 1%	$0.000985	$9.85	$118.20
Llama 3.1 8B Meta (via Groq) Small open model. 128K context.	n/a	128,000 Usage 1%	$0.000090	$0.9000	$10.80
DeepSeek V3 DeepSeek Frontier open model from DeepSeek. 64K context.		64,000 Usage 2%	$0.000820	$8.20	$98.40
DeepSeek R1 DeepSeek Open reasoning model. 64K context.		64,000 Usage 2%	$0.001645	$16.45	$197.40
Mistral Large 2 Mistral Mistral flagship. 128K context.	n/a	128,000 Usage 1%	$0.005000	$50.00	$600.00
Mistral Small 3 Mistral Compact Mistral. 32K context.	n/a	32,000 Usage 5%	$0.000500	$5.00	$60.00
Grok 3 xAI xAI flagship. 1M context.		1,000,000 Usage 0.1%	$0.0105	$105.00	$1,260.00
Grok 3 mini xAI Cheap xAI option. 1M context.		1,000,000 Usage 0.1%	$0.000550	$5.50	$66.00

Default prices are public list-price snapshots from each vendor's pricing page. Replace any cell with your contracted price and the totals update instantly. Volume discounts, batch API discounts, and private rates are not modelled by the defaults.

How the math works

per_request = (input_tokens - cached_tokens) * input_price / 1,000,000
            + cached_tokens * cached_price / 1,000,000
            + output_tokens * output_price / 1,000,000

per_day   = per_request * (requests_per_month / 30)
per_month = per_request * requests_per_month
per_year  = per_request * requests_per_month * 12

Models without a separate cached input price bill cached tokens at the regular input rate. Prices are USD per million tokens.

Common gotchas when budgeting

Output tokens cost the most. Reasoning models can return long hidden chains-of-thought that still count as output; budget for the worst case.
Cached input only counts where the provider offers prompt caching. The default cached column is blank for models that do not advertise a separate cached rate.
Context window is a hard cap. If input plus output exceeds the window, the request will fail before any cost is incurred. The table flags rows where the configured usage would not fit.
Vendor pricing changes. The defaults are list-price snapshots and should be confirmed against the official pricing page before signing off on a budget.

How to use

Enter the input tokens, output tokens, optional cached input tokens, and the number of requests you expect per month for a typical workload. Presets cover short Q&A, long-form generation, RAG with prompt caching, and reasoning-heavy traffic.
Optionally open the prompt token estimator to paste a real prompt and use its heuristic token count as the input value. For exact OpenAI tokenization, use the linked GPT Token Counter instead.
Pick a provider filter or leave it on All, choose a sort order (cheapest first, most expensive first, or by provider or name), and toggle individual models on or off in the table to focus the comparison.
Edit any price cell to match your contract rate. Defaults are public list-price snapshots; volume, batch, and committed-use discounts are not modelled and should be entered directly.
Read the cheapest pick card on the right for the per-request, per-day, per-month, and per-year cost of the lowest option, then compare against the most expensive option in the same selection.
Click Copy summary to grab a plain-text report with your inputs and the monthly cost for every selected model, sorted cheapest to most expensive.

About this tool

LLM API Cost Calculator estimates and compares per-token API pricing across the major frontier model vendors in one place: OpenAI (GPT-4o, GPT-4o mini, GPT-4.1, GPT-4.1 mini, o3, o4-mini, GPT-3.5 Turbo), Anthropic (Claude Opus 4, Claude Sonnet 4.5, Claude Haiku 4.5, Claude 3.5 Haiku), Google (Gemini 2.5 Pro, 2.5 Flash, 2.5 Flash Lite), Meta on Groq (Llama 3.3 70B, Llama 3.1 8B), DeepSeek (V3, R1), Mistral (Large 2, Small 3), and xAI (Grok 3, Grok 3 mini). Enter how many input tokens, optional cached tokens, and output tokens a typical request consumes, set a request volume, and the calculator multiplies through to per-request, daily, monthly, and yearly cost for every model side by side. Every price cell is editable so you can replace the public list-price defaults with your contracted rate, batch discount, or self-hosted estimate; volume discounts and batch API pricing are not modelled by the defaults, so the totals follow whatever rate you choose. Models with separate prompt-cache pricing have an explicit cached input column; models without it bill cached tokens at the regular input rate. A context-window column flags rows where input plus output would not fit, so an unrealistic configuration cannot quietly produce a tempting low number. Useful when sizing a chatbot, RAG pipeline, batch summarizer, or agent workflow, when comparing OpenAI versus Anthropic versus Google before picking a default, when justifying a vendor switch in an engineering review, or when sanity-checking a finance forecast. A built-in prompt token estimator turns pasted text into a rough token count using the ~4-chars-per-token heuristic (marked clearly as an estimate; use the GPT Token Counter for exact OpenAI tokenization), and the Copy summary button produces a plain-text report you can paste into a doc or a Slack message. All math runs in your browser; no prompts, token counts, or settings are uploaded or stored.

Free to use. Works in your browser. No signup, no login.

Related tools

LLM API Cost Calculator

How to use

About this tool

You may also like

GPT Token Counter

LLM Text Chunker

Payment Fee Calculator

CPM Calculator