// all accessible via one api key

Model Catalog

Browse every model available through LLMAI. Filter by provider, compare context windows and capabilities, and copy the slug directly into your code.

OpenAI

GPT-5.4

gpt-5.4

1M ctx

Mid-tier GPT-5 with strong general-purpose reasoning, multimodal input, and tool-calling support — the entry GPT-5 tier.

ChatReasoningVision

Input / 1M tokens

$0.787

Output / 1M tokens

$5.250

GPT-5.5

gpt-5.5

1M ctx

Flagship GPT-5.5. The most capable OpenAI model on the platform — deep reasoning, long-context comprehension, and frontier multimodal performance.

ChatReasoningVisionLong Context

Input / 1M tokens

$1.575

Output / 1M tokens

$10.500

GPT-5.6 Sol

gpt-5.6-sol

1M ctx

Complex professional work, reasoning and coding. The flagship GPT-5.6 model — strongest reasoning and code performance in the family.

ChatReasoningCodingVision

Input / 1M tokens

$2.500

Output / 1M tokens

$15.000

GPT-5.6 Terra

gpt-5.6-terra

1M ctx

A balanced option for intelligence and cost. Solid general-purpose reasoning at a lower price than Sol.

ChatReasoningVision

Input / 1M tokens

$1.250

Output / 1M tokens

$7.500

GPT-5.6 Luna

gpt-5.6-luna

1M ctx

Optimized for cost-sensitive, high-volume workloads. The cheapest GPT-5.6 model for fast everyday tasks.

ChatVision

Input / 1M tokens

$0.500

Output / 1M tokens

$3.000

Anthropic

Claude Opus 4.8NEW

claude-opus-4.8

1M ctx

Anthropic's newest frontier Opus model. State-of-the-art coding, agentic reasoning, and long-horizon task execution.

ChatReasoningCodeVision

Input / 1M tokens

$2.625

Output / 1M tokens

$10.500

Claude Opus 4.7

claude-opus-4.7

1M ctx

Previous-generation flagship Opus. Excellent for complex multi-step engineering, research synthesis, and high-stakes reasoning.

ChatReasoningCodeVision

Input / 1M tokens

$2.625

Output / 1M tokens

$10.500

Claude Opus 4.6

claude-opus-4.6

1M ctx

The cost-efficient Opus tier. Strong reasoning at a meaningfully lower per-token price than 4.7/4.8.

ChatReasoningCodeVision

Input / 1M tokens

$1.575

Output / 1M tokens

$7.875

Claude Sonnet 4.6

claude-sonnet-4.6

1M ctx

Balanced Sonnet — high-volume daily-driver for chat, code, and structured output. The recommended starting point in the Claude family.

ChatCodeVision

Input / 1M tokens

$1.050

Output / 1M tokens

$5.250

Claude Sonnet 5NEW

claude-sonnet-5

1M ctx

Anthropic's newest Sonnet — upgraded reasoning, code, and agentic performance over Sonnet 4.6 at the same price. The new default Claude for daily work.

ChatReasoningCodeVision

Input / 1M tokens

$1.050

Output / 1M tokens

$5.250

Claude Fable 5NEW

claude-fable-5

1M ctx

Anthropic's flagship reasoning model with configurable effort controls (low/medium/high/xhigh/max). Built for hard multi-step reasoning, research synthesis, and agentic workflows that benefit from extended thinking — at 1M-token context.

ChatReasoningCodeVisionLong Context

Input / 1M tokens

$4.500

Output / 1M tokens

$22.500

Claude Haiku 4.5

claude-haiku-4.5

1M ctx

Anthropic's fastest and most cost-efficient Claude model. Ideal for high-volume chat, classification, summarisation, and lightweight agent loops where latency and cost matter most.

ChatFastVision

Input / 1M tokens

$0.525

Output / 1M tokens

$2.625

Google

Gemini 3.5 Flash

gemini-3.5-flash

1M ctx

The latest production Gemini Flash. Sub-second latency, native multimodal input, and a 1M-token context window.

ChatFastVisionLong Context

Input / 1M tokens

$0.787

Output / 1M tokens

$4.725

Gemini 3.1 Pro (Preview)

gemini-3.1-pro-preview

1M ctx

Preview build of Gemini 3.1 Pro. Frontier reasoning and multimodal understanding ahead of GA.

ChatReasoningVisionLong Context

Input / 1M tokens

$1.050

Output / 1M tokens

$4.200

Gemini 3.1 Flash Lite (Preview)

gemini-3.1-flash-lite-preview

1M ctx

Ultra-cheap Gemini tier for high-volume routing, classification, and lightweight chat.

ChatFastLong Context

Input / 1M tokens

$0.137

Output / 1M tokens

$0.787

Gemini 3 Flash (Preview)

gemini-3-flash-preview

1M ctx

Preview Gemini 3 Flash. Solid multimodal Flash-tier model with a generous context window at a budget price.

ChatFastVision

Input / 1M tokens

$0.210

Output / 1M tokens

$1.260

Gemma 4

gemma-4

128K ctx

Google's open-weight Gemma 4 (31B) served via Ollama Cloud. Excellent value for everyday chat and code-completion tasks.

ChatFastCode

Input / 1M tokens

$0.046

Output / 1M tokens

$0.130

Veo 3.1

veo-3.1

— ctx

Google's text-to-video generation model. Produces short, high-quality video clips from natural-language prompts.

Video

Per generation

$0.21

Flat fee per successful video — no token billing.

Nano Banana 2

nano-banana-2

— ctx

Fast image-generation Gemini variant (gemini-3.1-flash-image-preview). Quick, cost-efficient image synthesis from text prompts.

Image

Per generation

$0.21

Flat fee per successful image — no token billing.

Nano Banana Pro

nano-banana-pro

— ctx

Higher-fidelity image model (gemini-3-pro-image-preview). Better detail and prompt-following for production-grade visuals.

Image

Per generation

$0.21

Flat fee per successful image — no token billing.

Kimi

Kimi K2.7 Code

kimi-k2.7-code

256K ctx

Moonshot AI's code-specialized K2.7 variant. Tuned for code generation, refactoring, and agentic coding workflows — with native long thinking and deep reasoning.

ChatReasoningCodeLong Context

Input / 1M tokens

$0.665

Output / 1M tokens

$2.800

Kimi K2.6

kimi-k2.6

256K ctx

Moonshot AI's flagship reasoning model. Excellent at complex problem-solving, multi-step reasoning, and long-document understanding.

ChatReasoningLong Context

Input / 1M tokens

$0.620

Output / 1M tokens

$2.600

Kimi K2.5

kimi-k2.5

256K ctx

The cost-efficient predecessor to K2.6, retaining strong long-context handling at a fraction of the price.

ChatReasoningLong Context

Input / 1M tokens

$0.210

Output / 1M tokens

$1.050

DeepSeek

DeepSeek V4 Pro

deepseek-v4-pro

128K ctx

DeepSeek's strongest model. Tuned for advanced coding tasks, technical reasoning, and structured output generation.

ChatCodeReasoning

Input / 1M tokens

$0.610

Output / 1M tokens

$1.220

DeepSeek V4 Flash

deepseek-v4-flash

128K ctx

The latency-optimised variant. Sub-second responses for interactive code, chat, and agent loops at near-zero cost.

ChatFastCode

Input / 1M tokens

$0.050

Output / 1M tokens

$0.100

DeepSeek V3.2

deepseek-v3.2

128K ctx

Cost-effective general-purpose model. Strong everyday performance with excellent value per token.

ChatReasoningCode

Input / 1M tokens

$0.088

Output / 1M tokens

$0.132

MiniMax

MiniMax M3

minimax-m3

1M ctx

MiniMax's flagship long-context model with a 1M-token window. Ideal for whole-codebase analysis and large-document tasks.

ChatReasoningLong Context

Input / 1M tokens

$0.420

Output / 1M tokens

$1.680

MiniMax M2.7

minimax-m2.7

1M ctx

The efficient long-context model. 1M-token window at a budget price — great for retrieval-heavy and document workflows.

ChatLong ContextFast

Input / 1M tokens

$0.200

Output / 1M tokens

$0.850

Alibaba

Qwen 3.6 Plus

qwen3.6-plus

128K ctx

Alibaba's latest Qwen-Plus tier. Strong multilingual performance, capable code generation, and reliable reasoning.

ChatReasoningCode

Input / 1M tokens

$0.120

Output / 1M tokens

$0.680

Qwen 3.5 Plus

qwen3.5-plus

128K ctx

The previous-generation Qwen-Plus model. Solid baseline for general chat and structured tasks.

ChatReasoningCode

Input / 1M tokens

$0.140

Output / 1M tokens

$0.810

Xiaomi

MiMo V2.5 Pro

mimo-v2.5-pro

128K ctx

Xiaomi's flagship MiMo model. Tuned for complex agent workflows, code-focused tasks, and structured reasoning.

ChatReasoningCode

Input / 1M tokens

$0.650

Output / 1M tokens

$1.950

MiMo V2.5

mimo-v2.5

128K ctx

The cost-efficient MiMo tier. Quick, capable, and great for high-volume general-purpose calls.

ChatFastReasoning

Input / 1M tokens

$0.260

Output / 1M tokens

$1.300

Z.AI

GLM 5.2

glm-5.2

128K ctx

Z.AI's flagship GLM model for long-horizon agentic tasks. Served via Ollama Cloud with native prefix caching.

ChatReasoningCode

Input / 1M tokens

$0.700

Output / 1M tokens

$2.200

GLM 5.1

glm-5.1

128K ctx

The latest iteration of Z.AI's GLM family, served via Ollama Cloud. Strong instruction-following and bilingual reasoning.

ChatReasoningCode

Input / 1M tokens

$0.550

Output / 1M tokens

$1.300