AI Models | 𝐗𝐀𝐈 𝐑𝐨𝐮𝐭𝐞𝐫

$1.75 in$14 out$0.175 cache hit

256K tokensLarge Language Model (LLM)

gpt-5.2

OpenAI's 2025 latest flagship model with comprehensive upgrades in reasoning, code, creative writing, and caching support

$3 in$15 out$3.75 cache write$0.30 cache hit

Anthropic

claude-sonnet-4-6

Anthropic's latest flagship model, excelling in code generation, analysis, and writing tasks with prompt caching support

$1.25 in$10 out$0.125 cache hit

256K tokensCode-Specialized Model

gpt-5.1-codex

OpenAI's 2025 code-specialized model, focused on code understanding, generation, and optimization with caching support

$3 in$15 out

Anthropic

claude-sonnet-4-5-20250929

Anthropic's 2025 flagship model, excelling in code generation, analysis, and writing tasks

$1.75 in$14 out$0.175 cache hit

256K tokensCode-Specialized Model

gpt-5.3-codex

OpenAI's 2025 code-specialized model, focused on code understanding, generation, and optimization with caching support

$0.2 in$1.25 out$0.02 cache hit

Ultra-Lightweight Large Language ModelLow-cost Q&A, classification, extraction, and other simple workloads

gpt-5.4-nano

OpenAI's ultra-light GPT-5.4 variant for simple low-cost workloads, currently available via API only

$0.4 in$1.6 out

gpt-4.1-mini

Lightweight version of GPT-4.1, offering excellent performance while being more cost-effective

$2 in$12 out$0.2 cache hit

Image

65K input / 32K output tokensDefault 1K, optional 2K / 4K, multiple aspect ratios

gemini-3-pro-image-preview

Google AI Studio text-to-image preview model with 1K/2K/4K output, multi-image reference, Thinking + Search Grounding

$0.1 in$0.4 out

gpt-4.1-nano

Ultra-lightweight version of GPT-4.1, offering extreme cost-effectiveness for simple and fast tasks

$1.75 in$14.0 out$0.175 cache hit

128K tokensUltra-Low-Latency Code Model (Small)

gpt-5.3-codex-spark

OpenAI's ultra-low-latency coding model released in 2026, built for real-time coding collaboration and rapid iteration with caching support

¥3.2 in¥16 out¥0.64 cache hit

doubao-seed-2-0-pro-260215

ByteDance Doubao Seed 2.0 Pro, optimized for long-chain reasoning and stability on complex real-world tasks

256K tokens256K tokens

$0.375 in$3 out$0.0375 cache hit

256K tokensLightweight Code Model

gpt-5.1-codex-mini

OpenAI's 2025 lightweight code model, offering faster response times and lower costs while maintaining high-quality code capabilities

¥3.2 in¥16 out¥0.64 cache hit

doubao-seed-2-0-code-preview-260215

Coding-enhanced Doubao Seed 2.0 variant optimized for Agentic Coding workflows

256K tokens256K tokens

$1.2 in$8 out

256K tokensCode-Specialized Model

ark-code-latest

ByteDance Doubao code-specialized model, focused on code understanding, generation, and optimization

$1.5 in$10 out

2M tokensMultimodal Large Language Model

gemini-2.5-pro

Google AI Studio's 2025 flagship multimodal model with ultra-long context support and powerful multimodal understanding capabilities

$2 in$12 out

1M input / 64K output tokensJanuary 2025

gemini-3.1-pro-preview

Google AI Studio preview multimodal model with a 1M context window and 64K output for advanced reasoning and high-quality generation

128K tokensCode-Specialized Model

$1 in$2 out

Moonshot AI

kimi-for-coding

Moonshot AI's Kimi code-specialized model, focused on code understanding, generation, and optimization

$2 in$3 out$0.4 cache hit

DeepSeek

deepseek-v3

DeepSeek's latest flagship model V3.2, 685B parameters, reasoning capabilities rivaling GPT-5, 128K context

nova-pro

AWS's high-performance multimodal model supporting text and image understanding

300K tokensMultimodal Large Language Model

¥0.6 in¥3.6 out¥0.12 cache hit

doubao-seed-2-0-lite-260215

ByteDance Doubao Seed 2.0 Lite balances generation quality and response speed for general production workloads

256K tokens224K tokens

$1.2 in$3.6 out

128K tokensTranslation-Specialized Model

doubao-seed-translation-250915

ByteDance Doubao translation-specialized model, providing high-quality multilingual translation services

¥0.2 in¥2 out¥0.04 cache hit

doubao-seed-2-0-mini-260215

ByteDance Doubao Seed 2.0 Mini targets low-latency, high-concurrency, and cost-sensitive deployments with four-level thinking modes

256K tokens224K tokens

Image

$0.25 in$1.50 out

Google AI StudioDesigned for speed and efficiency in interactive and high-throughput image generation

gemini-3.1-flash-image-preview

Fast Preview

Google AI Studio preview image generation model optimized for speed and efficiency, ideal for fast interactive responses and high throughput

300K tokensMultimodal Large Language Model

$0.06 in$0.24 out

AWS Bedrock

nova-lite

AWS Nova lightweight version, providing fast and economical multimodal capabilities

2M tokensMultimodal Large Language Model

$1.5 in$10 out

Google Vertex AI

google/gemini-2.5-pro

Google Vertex AI flagship multimodal model with ultra-long context support and powerful multimodal understanding capabilities

64K tokensReasoning Model

$1.4 in$2.8 out

DeepSeek

deepseek-r1

Chinese open-source reasoning model, rivaling o1 in mathematics, coding, and scientific reasoning with exceptional cost-effectiveness

$0.035 in$0.14 out

AWS Bedrock

nova-micro

AWS Nova ultra-lightweight version, providing extreme cost-effectiveness for text processing

$5 in$15 out

xAI

grok-4

xAI's latest flagship model with real-time internet search capabilities and timely knowledge updates

$0.5 in$3 out$0.05 cache hit

128K tokensCode Generation Model

gemini-3-flash-preview

Fast Multimodal

Google AI Studio high-throughput multimodal preview model with low latency and strong cost efficiency

grok-code-fast

xAI's code-optimized model designed for rapid code generation and understanding

$0 in$0 out

Large Language Model32.8B

qwen3-32b

Free

Alibaba Cloud Qwen 32B parameter large language model, powerful free AI assistant

$0 in$0 out

Tencent

tencent/Hunyuan-MT-7B

Free

Tencent Hunyuan machine translation model with ultra-low cost multilingual translation

Machine Translation7B

2M Chinese charactersLarge Language Model (LLM)

$1 in$3 out

Moonshot AI

moonshotai/kimi-k2-instruct-0905

Ultra Fast

Chinese ultra-long context model supporting 2 million characters input, excelling at long document analysis and processing

¥0.8 in¥4.8 out¥1 cache write¥0.8 cache hit

128K tokensNative vision-language Plus (linear attention + sparse MoE)

qwen3.5-plus

Qwen3.5 native vision-language Plus model with a hybrid linear-attention + sparse MoE architecture for strong reasoning and multimodal efficiency

$0.2 in$0.5 out

xAI

grok-4-fast

xAI's Grok-4 fast version, providing faster response times while maintaining powerful capabilities

$1 in$10 out$0.2 cache hit

Vision-Language Model (VLM)Text + Image

qwen3-vl-plus

Alibaba Qwen 3.0 vision-language model for strong multimodal understanding

$0.15 in$1.5 out$0.03 cache hit

Vision-Language Model (VLM)Text + Image

qwen3-vl-flash

Alibaba Qwen 3.0 lightweight vision-language model optimized for low latency

Rerank

$0.5 in

qwen3-rerank

Alibaba Qwen 3.0 text rerank model for relevance scoring and search result reordering

qwen3-max

Alibaba Qwen 3.0 flagship model with strong Chinese capabilities and high cost-effectiveness

$0.7 in$0 out

Vision Embedding ModelImages and Multimodal

doubao-embedding-vision

ByteDance Doubao vision embedding model, supporting vectorization of images and multimodal content

$0.7 in$0 out

Large Text Embedding Model2048 dimensions

doubao-embedding-large-text

ByteDance Doubao large text embedding model, providing higher quality text vectorization capabilities

Deep Reasoning Open Source

$4 in$16 out$1 cache hit

Moonshot AI

kimi-k2-thinking

Moonshot AI's reasoning-enhanced model with interleaved thinking and tool-use capabilities, excelling at complex reasoning and agentic tasks

256K tokensReasoning-Enhanced MoE Model

Image

$0.05/ image

Up to 4K (4096x4096)Image Generation

gpt-image-1

OpenAI's latest 2025 image generation model with comprehensively improved understanding capabilities and image quality

$0.5 in$1.5 out

256K tokensMoE (41B/675B)

mistral-large-latest

Mistral AI's flagship MoE open-source model with 675B total parameters, multimodal capabilities and 256K context

Rerank

$1.8 in

Multimodal Rerank ModelText + Image

qwen3-vl-rerank

Alibaba Qwen 3.0 multimodal rerank model for text-image retrieval reranking

$0.5 in$0 out

Text Embedding Model1024 dimensions

doubao-embedding-text

ByteDance's Doubao text embedding model for text vectorization and semantic retrieval

Audio

$0.006/ minute

99+ languagesAutomatic Speech Recognition (ASR)

whisper-1

Powerful speech recognition model supporting multilingual transcription and translation

$0.00013/ 1K tokens

1M tokensMultimodal Large Language Model

text-embedding-3-large

High-performance text embedding model for semantic search and similarity calculation

gemini-2.5-flash

Multimodal

Google AI Studio's fast multimodal model with ultra-long context support

1M tokensMultimodal Large Language Model

$0.3 in$2.5 out

Google Vertex AI

google/gemini-2.5-flash

Google Vertex AI fast multimodal model with ultra-long context support and enterprise-grade reliability

$0.1 in$0.4 out

1M tokensLightweight Multimodal Model

gemini-2.5-flash-lite

Ultra Fast

Google AI Studio's ultra-lightweight multimodal model with ultra-fast response

1M tokensLightweight Multimodal Model

$0.1 in$0.4 out

Google Vertex AI

google/gemini-2.5-flash-lite

Google Vertex AI ultra-lightweight multimodal model with ultra-fast response and enterprise deployment

$0.2 in$0.2 out

128K tokens14B Parameters

ministral-14b-latest

Mistral AI's most capable edge model with 14B parameters, vision capabilities and reasoning variants

$0.15 in$0.15 out

ministral-8b-latest

Mistral AI's edge-optimized medium model with 8B parameters, vision capabilities and sliding window attention

128K tokens8B Parameters

$0.1 in$0.1 out

ministral-3b-latest

Mistral AI's edge-optimized small model with 3B parameters, vision capabilities and 128K context

128K tokens3B Parameters

¥2.1 in¥8.4 out¥0.21 cache hit

MiniMax

MiniMax-M2.5

MiniMax's 2026 reasoning model, optimized for coding, tool use and search, and office productivity workflows

Reasoning Model (M2 Series)~50 TPS

Online Search ModelReal-time Web Search

$1 in$1 out

Perplexity

sonar

Perplexity online search model with real-time internet access

¥4.2 in¥33.6 out¥0.42 cache hit

MiniMax

MiniMax-M2.5-highspeed

High-speed MiniMax-M2.5 variant (M2.5-Lightning) with aligned core capabilities, tuned for low-latency and high-throughput agent workloads

High-Speed Reasoning Model (M2.5-Lightning)~100 TPS

High-Performance Search ModelAdvanced Reasoning + Real-time Search

$3 in$15 out

Perplexity

sonar-pro

Perplexity high-performance online search model with enhanced reasoning capabilities

$2 in$8 out$0.4 cache hit

Zhipu AI

glm-4.7

Zhipu AI's latest flagship model with coding capabilities matching Claude Sonnet 4, supporting 200K ultra-long context, deep reasoning and tool calling

$0 in$0 out

Zhipu AI

glm-4.7-flash

Free

Zhipu AI GLM-4.7 Flash, a low-latency high-throughput model for real-time chat and lightweight tasks, free to use