Zero Signup ToolsFree browser tools

Developer Tools

LLM API Cost Calculator

Estimate and compare API pricing for GPT-4o, Claude, Gemini, Llama, DeepSeek, Mistral, and Grok. Editable rates, monthly and yearly cost.

Per-request usage

Enter the tokens you expect to send and receive on a typical request. The calculator multiplies by the request volume on the right.

Prompt plus system message tokens you send.

Tokens the model returns. Used to bill at the higher output rate.

Tokens read from a prompt cache. Models without a cached price bill these as normal input.

Roughly 333 per day, 120,000 per year.

Cheapest pick

Meta (via Groq)

Llama 3.1 8B

Small open model. 128K context.

Per request
$0.000090
Per day
$0.0300
Per month
$0.9000
Per year
$10.80

Versus the most expensive option

Anthropic Claude Opus 4 would cost $525.00 per month, about 583 x more than the cheapest pick.

Per-model comparison

List prices in USD per million tokens. Every cell is editable. Click a row to toggle it on or off in the totals.

OnModelInput $/1MCached $/1MOutput $/1MContextPer requestPer monthPer year

GPT-4o

OpenAI

Flagship multimodal model. 128K context.

128,000

Usage 1%

$0.007500$75.00$900.00

GPT-4o mini

OpenAI

Small, fast, low-cost. 128K context.

128,000

Usage 1%

$0.000450$4.50$54.00

GPT-4.1

OpenAI

Long-context reasoning. 1M context.

1,000,000

Usage 0.1%

$0.006000$60.00$720.00

GPT-4.1 mini

OpenAI

Smaller GPT-4.1 variant. 1M context.

1,000,000

Usage 0.1%

$0.001200$12.00$144.00

o3

OpenAI

High-end reasoning model. 200K context.

200,000

Usage 0.8%

$0.006000$60.00$720.00

o4-mini

OpenAI

Affordable reasoning model. 200K context.

200,000

Usage 0.8%

$0.003300$33.00$396.00

GPT-3.5 Turbo

OpenAI

Legacy chat model. 16K context.

n/a

16,385

Usage 9%

$0.001250$12.50$150.00

Claude Opus 4

Anthropic

Top-tier reasoning. 200K context.

200,000

Usage 0.8%

$0.0525$525.00$6,300.00

Claude Sonnet 4.5

Anthropic

Balanced flagship. 200K context.

200,000

Usage 0.8%

$0.0105$105.00$1,260.00

Claude Haiku 4.5

Anthropic

Fastest Claude. 200K context.

200,000

Usage 0.8%

$0.003500$35.00$420.00

Claude 3.5 Haiku

Anthropic

Previous-generation fast model. 200K context.

200,000

Usage 0.8%

$0.002800$28.00$336.00

Gemini 2.5 Pro

Google

Flagship Google model. 2M context.

2,000,000

Usage 0.07%

$0.006250$62.50$750.00

Gemini 2.5 Flash

Google

Fast Google model. 1M context.

1,000,000

Usage 0.1%

$0.001550$15.50$186.00

Gemini 2.5 Flash Lite

Google

Cheapest Gemini. 1M context.

1,000,000

Usage 0.1%

$0.000300$3.00$36.00

Llama 3.3 70B

Meta (via Groq)

Open-weights flagship. 128K context.

n/a

128,000

Usage 1%

$0.000985$9.85$118.20

Llama 3.1 8B

Meta (via Groq)

Small open model. 128K context.

n/a

128,000

Usage 1%

$0.000090$0.9000$10.80

DeepSeek V3

DeepSeek

Frontier open model from DeepSeek. 64K context.

64,000

Usage 2%

$0.000820$8.20$98.40

DeepSeek R1

DeepSeek

Open reasoning model. 64K context.

64,000

Usage 2%

$0.001645$16.45$197.40

Mistral Large 2

Mistral

Mistral flagship. 128K context.

n/a

128,000

Usage 1%

$0.005000$50.00$600.00

Mistral Small 3

Mistral

Compact Mistral. 32K context.

n/a

32,000

Usage 5%

$0.000500$5.00$60.00

Grok 3

xAI

xAI flagship. 1M context.

1,000,000

Usage 0.1%

$0.0105$105.00$1,260.00

Grok 3 mini

xAI

Cheap xAI option. 1M context.

1,000,000

Usage 0.1%

$0.000550$5.50$66.00

Default prices are public list-price snapshots from each vendor's pricing page. Replace any cell with your contracted price and the totals update instantly. Volume discounts, batch API discounts, and private rates are not modelled by the defaults.

How the math works

per_request = (input_tokens - cached_tokens) * input_price / 1,000,000
            + cached_tokens * cached_price / 1,000,000
            + output_tokens * output_price / 1,000,000

per_day   = per_request * (requests_per_month / 30)
per_month = per_request * requests_per_month
per_year  = per_request * requests_per_month * 12

Models without a separate cached input price bill cached tokens at the regular input rate. Prices are USD per million tokens.

Common gotchas when budgeting

  • Output tokens cost the most. Reasoning models can return long hidden chains-of-thought that still count as output; budget for the worst case.
  • Cached input only counts where the provider offers prompt caching. The default cached column is blank for models that do not advertise a separate cached rate.
  • Context window is a hard cap. If input plus output exceeds the window, the request will fail before any cost is incurred. The table flags rows where the configured usage would not fit.
  • Vendor pricing changes. The defaults are list-price snapshots and should be confirmed against the official pricing page before signing off on a budget.

How to use

  1. Enter the input tokens, output tokens, optional cached input tokens, and the number of requests you expect per month for a typical workload. Presets cover short Q&A, long-form generation, RAG with prompt caching, and reasoning-heavy traffic.
  2. Optionally open the prompt token estimator to paste a real prompt and use its heuristic token count as the input value. For exact OpenAI tokenization, use the linked GPT Token Counter instead.
  3. Pick a provider filter or leave it on All, choose a sort order (cheapest first, most expensive first, or by provider or name), and toggle individual models on or off in the table to focus the comparison.
  4. Edit any price cell to match your contract rate. Defaults are public list-price snapshots; volume, batch, and committed-use discounts are not modelled and should be entered directly.
  5. Read the cheapest pick card on the right for the per-request, per-day, per-month, and per-year cost of the lowest option, then compare against the most expensive option in the same selection.
  6. Click Copy summary to grab a plain-text report with your inputs and the monthly cost for every selected model, sorted cheapest to most expensive.

About this tool

LLM API Cost Calculator estimates and compares per-token API pricing across the major frontier model vendors in one place: OpenAI (GPT-4o, GPT-4o mini, GPT-4.1, GPT-4.1 mini, o3, o4-mini, GPT-3.5 Turbo), Anthropic (Claude Opus 4, Claude Sonnet 4.5, Claude Haiku 4.5, Claude 3.5 Haiku), Google (Gemini 2.5 Pro, 2.5 Flash, 2.5 Flash Lite), Meta on Groq (Llama 3.3 70B, Llama 3.1 8B), DeepSeek (V3, R1), Mistral (Large 2, Small 3), and xAI (Grok 3, Grok 3 mini). Enter how many input tokens, optional cached tokens, and output tokens a typical request consumes, set a request volume, and the calculator multiplies through to per-request, daily, monthly, and yearly cost for every model side by side. Every price cell is editable so you can replace the public list-price defaults with your contracted rate, batch discount, or self-hosted estimate; volume discounts and batch API pricing are not modelled by the defaults, so the totals follow whatever rate you choose. Models with separate prompt-cache pricing have an explicit cached input column; models without it bill cached tokens at the regular input rate. A context-window column flags rows where input plus output would not fit, so an unrealistic configuration cannot quietly produce a tempting low number. Useful when sizing a chatbot, RAG pipeline, batch summarizer, or agent workflow, when comparing OpenAI versus Anthropic versus Google before picking a default, when justifying a vendor switch in an engineering review, or when sanity-checking a finance forecast. A built-in prompt token estimator turns pasted text into a rough token count using the ~4-chars-per-token heuristic (marked clearly as an estimate; use the GPT Token Counter for exact OpenAI tokenization), and the Copy summary button produces a plain-text report you can paste into a doc or a Slack message. All math runs in your browser; no prompts, token counts, or settings are uploaded or stored.

Free to use. Works in your browser. No signup, no login.

Related tools

You may also like

All tools
All toolsDeveloper Tools