𝐗𝐀𝐈 𝐑𝐨𝐮𝐭𝐞𝐫
  • ZH
  • X
  • GitHub
  • 𝐗𝐀𝐈
  • V
  • 𝑫𝒐𝒄𝒔
  • 𝑩𝒍𝒐𝒈
  • 𝑳𝑳𝑴
  • 𝑾𝒂𝒍𝒍
  • 𝑨𝒃𝒐𝒖𝒕
OpenAI
gpt-5.3-codex

256K tokens

OpenAI
gpt-5.4

400K tokens

Anthropic
claude-opus-4-6

200K tokens / 1M tokens (beta)

Google AI Studio
gemini-3.1-pro-preview

1M input / 64K output tokens

Text
$21 in$168 out
OpenAI

gpt-5.2-pro

OpenAI's 2025 most powerful professional model, exceptional in complex reasoning and code generation

256K tokensLarge Language Model (LLM)
Text
$2.5 in$15 out$0.25 cache hit
OpenAI

gpt-5.4

OpenAI's 2026 flagship model with 400K context and cached-input pricing for reasoning, coding, and multimodal tasks

400K tokensLarge Language Model (LLM)
Text
$30 in$180 out
OpenAI

gpt-5.4-pro

OpenAI's 2026 most powerful professional model for advanced reasoning, complex analysis, and production-grade workflows

400K tokensLarge Language Model (LLM, Pro)
Text
$1.75 in$14 out$0.175 cache hit
OpenAI

gpt-5.3-chat

OpenAI's 2026 latest chat model, exceptional in reasoning, code, creative writing, with caching support

256K tokensLarge Language Model (LLM)
Text
$0.75 in$4.5 out$0.075 cache hit
OpenAI

gpt-5.4-mini

OpenAI's lightweight GPT-5.4 variant balancing cost, quality, and cached-input support for both API and Codex workflows

Lightweight Large Language ModelBalanced cost and quality for general development, automation, and everyday reasoning
Text
$5 in$25 out
Anthropic

claude-opus-4-6

Our most intelligent model for building agents and coding

200K tokens / 1M tokens (beta)Yes
Text
$2.0 in$8.0 out
OpenAI

gpt-4.1

OpenAI's upgraded GPT-4 series model with excellent performance in reasoning and creative tasks

128K tokensLarge Language Model (LLM)
Text
$1.25 in$10 out$0.125 cache hit
OpenAI

gpt-5.1-codex-max

OpenAI's 2025 flagship code-specialized model with the most powerful code understanding and generation capabilities, supporting massive context and complex code tasks

512K tokensFlagship Code-Specialized Model
Text
$1.75 in$14 out$0.175 cache hit
OpenAI

gpt-5.2

OpenAI's 2025 latest flagship model with comprehensive upgrades in reasoning, code, creative writing, and caching support

256K tokensLarge Language Model (LLM)
Text
$3 in$15 out$3.75 cache write$0.30 cache hit
Anthropic

claude-sonnet-4-6

Anthropic's latest flagship model, excelling in code generation, analysis, and writing tasks with prompt caching support

200K tokensLarge Language Model (LLM)
Text
$1.25 in$10 out$0.125 cache hit
OpenAI

gpt-5.1-codex

OpenAI's 2025 code-specialized model, focused on code understanding, generation, and optimization with caching support

256K tokensCode-Specialized Model
Text
$3 in$15 out
Anthropic

claude-sonnet-4-5-20250929

Anthropic's 2025 flagship model, excelling in code generation, analysis, and writing tasks

200K tokensLarge Language Model (LLM)
Text
$1.75 in$14 out$0.175 cache hit
OpenAI

gpt-5.3-codex

OpenAI's 2025 code-specialized model, focused on code understanding, generation, and optimization with caching support

256K tokensCode-Specialized Model
Text
$0.2 in$1.25 out$0.02 cache hit
OpenAI

gpt-5.4-nano

OpenAI's ultra-light GPT-5.4 variant for simple low-cost workloads, currently available via API only

Ultra-Lightweight Large Language ModelLow-cost Q&A, classification, extraction, and other simple workloads
Text
$0.4 in$1.6 out
OpenAI

gpt-4.1-mini

Lightweight version of GPT-4.1, offering excellent performance while being more cost-effective

128K tokensLarge Language Model (LLM)
Image
$2 in$12 out$0.2 cache hit
Google AI Studio

gemini-3-pro-image-preview

Google AI Studio text-to-image preview model with 1K/2K/4K output, multi-image reference, Thinking + Search Grounding

65K input / 32K output tokensDefault 1K, optional 2K / 4K, multiple aspect ratios
Text
$0.1 in$0.4 out
OpenAI

gpt-4.1-nano

Ultra-lightweight version of GPT-4.1, offering extreme cost-effectiveness for simple and fast tasks

128K tokensLarge Language Model (LLM)
Text
$1.75 in$14.0 out$0.175 cache hit
OpenAI

gpt-5.3-codex-spark

OpenAI's ultra-low-latency coding model released in 2026, built for real-time coding collaboration and rapid iteration with caching support

128K tokensUltra-Low-Latency Code Model (Small)
Text
¥3.2 in¥16 out¥0.64 cache hit
ByteDance

doubao-seed-2-0-pro-260215

Top Pick

ByteDance Doubao Seed 2.0 Pro, optimized for long-chain reasoning and stability on complex real-world tasks

256K tokens256K tokens
Text
$0.375 in$3 out$0.0375 cache hit
OpenAI

gpt-5.1-codex-mini

OpenAI's 2025 lightweight code model, offering faster response times and lower costs while maintaining high-quality code capabilities

256K tokensLightweight Code Model
Text
¥3.2 in¥16 out¥0.64 cache hit
ByteDance

doubao-seed-2-0-code-preview-260215

Top Pick

Coding-enhanced Doubao Seed 2.0 variant optimized for Agentic Coding workflows

256K tokens256K tokens
Text
$1.2 in$8 out
ByteDance

ark-code-latest

ByteDance Doubao code-specialized model, focused on code understanding, generation, and optimization

256K tokensCode-Specialized Model
Text
$1.5 in$10 out
Google AI Studio

gemini-2.5-pro

Google AI Studio's 2025 flagship multimodal model with ultra-long context support and powerful multimodal understanding capabilities

2M tokensMultimodal Large Language Model
Text
$2 in$12 out
Google AI Studio

gemini-3.1-pro-preview

Google AI Studio preview multimodal model with a 1M context window and 64K output for advanced reasoning and high-quality generation

1M input / 64K output tokensJanuary 2025
Text
$1 in$2 out
Moonshot AI

kimi-for-coding

Moonshot AI's Kimi code-specialized model, focused on code understanding, generation, and optimization

128K tokensCode-Specialized Model
Text
$2 in$3 out$0.4 cache hit
DeepSeek

deepseek-v3

DeepSeek's latest flagship model V3.2, 685B parameters, reasoning capabilities rivaling GPT-5, 128K context

128K tokens685B
Text
$0.8 in$3.2 out
AWS Bedrock

nova-pro

AWS's high-performance multimodal model supporting text and image understanding

300K tokensMultimodal Large Language Model
Text
¥0.6 in¥3.6 out¥0.12 cache hit
ByteDance

doubao-seed-2-0-lite-260215

Top Pick

ByteDance Doubao Seed 2.0 Lite balances generation quality and response speed for general production workloads

256K tokens224K tokens
Text
$1.2 in$3.6 out
ByteDance

doubao-seed-translation-250915

ByteDance Doubao translation-specialized model, providing high-quality multilingual translation services

128K tokensTranslation-Specialized Model
Text
¥0.2 in¥2 out¥0.04 cache hit
ByteDance

doubao-seed-2-0-mini-260215

Top Pick

ByteDance Doubao Seed 2.0 Mini targets low-latency, high-concurrency, and cost-sensitive deployments with four-level thinking modes

256K tokens224K tokens
Image
$0.25 in$1.50 out
Google AI Studio

gemini-3.1-flash-image-preview

Fast Preview

Google AI Studio preview image generation model optimized for speed and efficiency, ideal for fast interactive responses and high throughput

Google AI StudioDesigned for speed and efficiency in interactive and high-throughput image generation
Text
$0.06 in$0.24 out
AWS Bedrock

nova-lite

AWS Nova lightweight version, providing fast and economical multimodal capabilities

300K tokensMultimodal Large Language Model
Text
$1.5 in$10 out
Google Vertex AI

google/gemini-2.5-pro

Google Vertex AI flagship multimodal model with ultra-long context support and powerful multimodal understanding capabilities

2M tokensMultimodal Large Language Model
Text
$1.4 in$2.8 out
DeepSeek

deepseek-r1

Chinese open-source reasoning model, rivaling o1 in mathematics, coding, and scientific reasoning with exceptional cost-effectiveness

64K tokensReasoning Model
Text
$0.035 in$0.14 out
AWS Bedrock

nova-micro

AWS Nova ultra-lightweight version, providing extreme cost-effectiveness for text processing

128K tokensLarge Language Model (LLM)
Text
$5 in$15 out
xAI

grok-4

xAI's latest flagship model with real-time internet search capabilities and timely knowledge updates

128K tokensLarge Language Model (LLM)
Text
$0.5 in$3 out$0.05 cache hit
Google AI Studio

gemini-3-flash-preview

Fast Multimodal

Google AI Studio high-throughput multimodal preview model with low latency and strong cost efficiency

1M tokens64K tokens
Text
$0.2 in$1.5 out
xAI

grok-code-fast

xAI's code-optimized model designed for rapid code generation and understanding

128K tokensCode Generation Model
Text
$0 in$0 out
Alibaba Cloud

qwen3-32b

Free

Alibaba Cloud Qwen 32B parameter large language model, powerful free AI assistant

Large Language Model32.8B
Text
$0 in$0 out
Tencent

tencent/Hunyuan-MT-7B

Free

Tencent Hunyuan machine translation model with ultra-low cost multilingual translation

Machine Translation7B
Text
$1 in$3 out
Moonshot AI

moonshotai/kimi-k2-instruct-0905

Ultra Fast

Chinese ultra-long context model supporting 2 million characters input, excelling at long document analysis and processing

2M Chinese charactersLarge Language Model (LLM)
Text
¥0.8 in¥4.8 out¥1 cache write¥0.8 cache hit
Alibaba Cloud

qwen3.5-plus

Qwen3.5 native vision-language Plus model with a hybrid linear-attention + sparse MoE architecture for strong reasoning and multimodal efficiency

128K tokensNative vision-language Plus (linear attention + sparse MoE)
Text
$0.2 in$0.5 out
xAI

grok-4-fast

xAI's Grok-4 fast version, providing faster response times while maintaining powerful capabilities

128K tokensLarge Language Model (LLM)
Text
$1 in$10 out$0.2 cache hit
Alibaba Cloud

qwen3-vl-plus

Alibaba Qwen 3.0 vision-language model for strong multimodal understanding

Vision-Language Model (VLM)Text + Image
Text
$0.15 in$1.5 out$0.03 cache hit
Alibaba Cloud

qwen3-vl-flash

Alibaba Qwen 3.0 lightweight vision-language model optimized for low latency

Vision-Language Model (VLM)Text + Image
Rerank
$0.5 in
Alibaba Cloud

qwen3-rerank

Alibaba Qwen 3.0 text rerank model for relevance scoring and search result reordering

Rerank ModelText
Text
$2 in$6 out
Alibaba Cloud

qwen3-max

Alibaba Qwen 3.0 flagship model with strong Chinese capabilities and high cost-effectiveness

128K tokensLarge Language Model (LLM)
Embedding
$0.7 in$0 out
ByteDance

doubao-embedding-vision

ByteDance Doubao vision embedding model, supporting vectorization of images and multimodal content

Vision Embedding ModelImages and Multimodal
Embedding
$0.7 in$0 out
ByteDance

doubao-embedding-large-text

ByteDance Doubao large text embedding model, providing higher quality text vectorization capabilities

Large Text Embedding Model2048 dimensions
Text
$4 in$16 out$1 cache hit
Moonshot AI

kimi-k2-thinking

Deep Reasoning Open Source

Moonshot AI's reasoning-enhanced model with interleaved thinking and tool-use capabilities, excelling at complex reasoning and agentic tasks

256K tokensReasoning-Enhanced MoE Model
Image
$0.05/ image
OpenAI

gpt-image-1

OpenAI's latest 2025 image generation model with comprehensively improved understanding capabilities and image quality

Up to 4K (4096x4096)Image Generation
Text
$0.5 in$1.5 out
Mistral AI

mistral-large-latest

Mistral AI's flagship MoE open-source model with 675B total parameters, multimodal capabilities and 256K context

256K tokensMoE (41B/675B)
Rerank
$1.8 in
Alibaba Cloud

qwen3-vl-rerank

Alibaba Qwen 3.0 multimodal rerank model for text-image retrieval reranking

Multimodal Rerank ModelText + Image
Embedding
$0.5 in$0 out
ByteDance

doubao-embedding-text

ByteDance's Doubao text embedding model for text vectorization and semantic retrieval

Text Embedding Model1024 dimensions
Audio
$0.006/ minute
OpenAI

whisper-1

Powerful speech recognition model supporting multilingual transcription and translation

99+ languagesAutomatic Speech Recognition (ASR)
Embedding
$0.00013/ 1K tokens
OpenAI

text-embedding-3-large

High-performance text embedding model for semantic search and similarity calculation

3072Text Embedding
Text
$0.3 in$2.5 out
Google AI Studio

gemini-2.5-flash

Multimodal

Google AI Studio's fast multimodal model with ultra-long context support

1M tokensMultimodal Large Language Model
Text
$0.3 in$2.5 out
Google Vertex AI

google/gemini-2.5-flash

Google Vertex AI fast multimodal model with ultra-long context support and enterprise-grade reliability

1M tokensMultimodal Large Language Model
Text
$0.1 in$0.4 out
Google AI Studio

gemini-2.5-flash-lite

Ultra Fast

Google AI Studio's ultra-lightweight multimodal model with ultra-fast response

1M tokensLightweight Multimodal Model
Text
$0.1 in$0.4 out
Google Vertex AI

google/gemini-2.5-flash-lite

Google Vertex AI ultra-lightweight multimodal model with ultra-fast response and enterprise deployment

1M tokensLightweight Multimodal Model
Text
$0.2 in$0.2 out
Mistral AI

ministral-14b-latest

Mistral AI's most capable edge model with 14B parameters, vision capabilities and reasoning variants

128K tokens14B Parameters
Text
$0.15 in$0.15 out
Mistral AI

ministral-8b-latest

Mistral AI's edge-optimized medium model with 8B parameters, vision capabilities and sliding window attention

128K tokens8B Parameters
Text
$0.1 in$0.1 out
Mistral AI

ministral-3b-latest

Mistral AI's edge-optimized small model with 3B parameters, vision capabilities and 128K context

128K tokens3B Parameters
Text
¥2.1 in¥8.4 out¥0.21 cache hit
MiniMax

MiniMax-M2.5

MiniMax's 2026 reasoning model, optimized for coding, tool use and search, and office productivity workflows

Reasoning Model (M2 Series)~50 TPS
Text
$1 in$1 out
Perplexity

sonar

Perplexity online search model with real-time internet access

Online Search ModelReal-time Web Search
Text
¥4.2 in¥33.6 out¥0.42 cache hit
MiniMax

MiniMax-M2.5-highspeed

High-speed MiniMax-M2.5 variant (M2.5-Lightning) with aligned core capabilities, tuned for low-latency and high-throughput agent workloads

High-Speed Reasoning Model (M2.5-Lightning)~100 TPS
Text
$3 in$15 out
Perplexity

sonar-pro

Perplexity high-performance online search model with enhanced reasoning capabilities

High-Performance Search ModelAdvanced Reasoning + Real-time Search
Text
$2 in$8 out$0.4 cache hit
Zhipu AI

glm-4.7

Zhipu AI's latest flagship model with coding capabilities matching Claude Sonnet 4, supporting 200K ultra-long context, deep reasoning and tool calling

200K tokensLarge Language Model (LLM)
Text
$0 in$0 out
Zhipu AI

glm-4.7-flash

Free

Zhipu AI GLM-4.7 Flash, a low-latency high-throughput model for real-time chat and lightweight tasks, free to use

200K tokensLarge Language Model (LLM)

No models found. Try a shorter keyword or another filter.

  • ©2022-2026 XABC Labs | Status | License | Email
  • Privacy
  • Terms of Service