AI Models | 𝐗𝐀𝐈

256K tokens

1,050,000 tokens

claude-opus-4-7

1M tokens

claude-haiku-4-5

200K tokens

gpt-5.4-pro

OpenAI's 2026 most powerful professional model for advanced reasoning, complex analysis, and production-grade workflows

400K tokensLarge Language Model (LLM, Pro)

gpt-5.5

$0.50 cache hit

OpenAI's latest frontier flagship model for complex professional work, coding, and agentic workflows, with 1M+ context, text and image input, and text output

1,050,000 tokens128K tokens

claude-opus-4-7

5m: $6.25 / MTok; 1h: $10 / MTok cache write$0.50 / MTok cache hit

Anthropic's most capable generally available model for complex reasoning, agentic coding, and long-horizon work

1M tokensLarge Language Model (LLM)

gpt-5.4

$0.25 cache hit

OpenAI's 2026 flagship model with 400K context and cached-input pricing for reasoning, coding, and multimodal tasks

1M tokensLarge Language Model (LLM)

claude-sonnet-4-6

$3.75 cache write$0.30 cache hit

Anthropic's latest flagship model, excelling in code generation, analysis, and writing tasks with prompt caching support

200K tokensLarge Language Model (LLM)

$0.75 in$4.5 out

gpt-5.4-mini

$0.075 cache hit

OpenAI's lightweight GPT-5.4 variant balancing cost, quality, and cached-input support for both API and Codex workflows

Lightweight Large Language ModelBalanced cost and quality for general development, automation, and everyday reasoning

claude-opus-4-6

Our most intelligent model for building agents and coding

200K tokens / 1M tokens (beta)Yes

$30/ M image output tokens

gpt-image-2

OpenAI frontier image generation and editing model with text and image input, flexible sizes, and high-quality image output

Image generation and editingText and image input, image output

$1.75 in$14 out

gpt-5.3-codex

$0.175 cache hit

OpenAI's 2025 code-specialized model, focused on code understanding, generation, and optimization with caching support

256K tokensCode-Specialized Model

¥3.2 in¥16 out

doubao-seed-2-0-pro-260215

¥0.64 cache hit

ByteDance Doubao Seed 2.0 Pro, optimized for long-chain reasoning and stability on complex real-world tasks

256K tokens256K tokens

$1.75 in$14.0 out

gpt-5.3-codex-spark

$0.175 cache hit

OpenAI's ultra-low-latency coding model released in 2026, built for real-time coding collaboration and rapid iteration with caching support

128K tokensUltra-Low-Latency Code Model (Small)

Google AI Studio

gemini-3-pro-image-preview

Google AI Studio text-to-image preview model with 1K/2K/4K output, multi-image reference, Thinking + Search Grounding

65K input / 32K output tokensDefault 1K, optional 2K / 4K, multiple aspect ratios

claude-haiku-4-5

5m: $1.25 / MTok; 1h: $2 / MTok cache write$0.10 / MTok cache hit

Anthropic's latest Haiku model for low-latency, cost-efficient, high-throughput workloads

200K tokensLarge Language Model (LLM)

¥0.6 in¥3.6 out

doubao-seed-2-0-lite-260215

¥0.12 cache hit

ByteDance Doubao Seed 2.0 Lite balances generation quality and response speed for general production workloads

256K tokens224K tokens

Google AI Studio

gemini-2.5-pro

Google AI Studio's 2025 flagship multimodal model with ultra-long context support and powerful multimodal understanding capabilities

2M tokensMultimodal Large Language Model

$0.8 in$3.2 out

nova-pro

AWS's high-performance multimodal model supporting text and image understanding

300K tokensMultimodal Large Language Model

kimi-for-coding

Moonshot AI's Kimi code-specialized model, focused on code understanding, generation, and optimization

128K tokensCode-Specialized Model

Google AI Studio

gemini-3.1-pro-preview

Google AI Studio preview multimodal model with a 1M context window and 64K output for advanced reasoning and high-quality generation

1M input / 64K output tokensJanuary 2025

ark-code-latest

ByteDance Doubao code-specialized model, focused on code understanding, generation, and optimization

256K tokensCode-Specialized Model

$1.2 in$3.6 out

doubao-seed-translation-250915

ByteDance Doubao translation-specialized model, providing high-quality multilingual translation services

128K tokensTranslation-Specialized Model

deepseek-v3

DeepSeek's latest flagship model V3.2, 685B parameters, reasoning capabilities rivaling GPT-5, 128K context

128K tokens685B

Google Vertex AI

google/gemini-2.5-pro

Google Vertex AI flagship multimodal model with ultra-long context support and powerful multimodal understanding capabilities

2M tokensMultimodal Large Language Model

¥0.2 in¥2 out

doubao-seed-2-0-mini-260215

¥0.04 cache hit

ByteDance Doubao Seed 2.0 Mini targets low-latency, high-concurrency, and cost-sensitive deployments with four-level thinking modes

256K tokens224K tokens

$0.25 in$1.50 out

Google AI Studio

gemini-3.1-flash-image-preview

Google AI Studio preview image generation model optimized for speed and efficiency, ideal for fast interactive responses and high throughput

Google AI StudioDesigned for speed and efficiency in interactive and high-throughput image generation

$1.4 in$2.8 out

deepseek-r1

Chinese open-source reasoning model, rivaling o1 in mathematics, coding, and scientific reasoning with exceptional cost-effectiveness

64K tokensReasoning Model

$0.06 in$0.24 out

nova-lite

AWS Nova lightweight version, providing fast and economical multimodal capabilities

300K tokensMultimodal Large Language Model

$0.035 in$0.14 out

nova-micro

AWS Nova ultra-lightweight version, providing extreme cost-effectiveness for text processing

128K tokensLarge Language Model (LLM)

Google AI Studio

gemini-3-flash-preview

Fast Multimodal

$0.05 cache hit

Google AI Studio high-throughput multimodal preview model with low latency and strong cost efficiency

1M tokens64K tokens

grok-4

xAI's latest flagship model with real-time internet search capabilities and timely knowledge updates

128K tokensLarge Language Model (LLM)

$0.2 in$0.2 out

qwen3-32b

Alibaba Cloud Qwen 32B parameter large language model, a powerful cost-effective AI assistant

Large Language Model32.8B

qwen3-vl-plus

Alibaba Qwen 3.0 vision-language model for strong multimodal understanding

Vision-Language Model (VLM)Text + Image

moonshotai/kimi-k2-instruct-0905

Chinese ultra-long context model supporting 2 million characters input, excelling at long document analysis and processing

2M Chinese charactersLarge Language Model (LLM)

$0.2 in$1.5 out

grok-code-fast

xAI's code-optimized model designed for rapid code generation and understanding

128K tokensCode Generation Model

$0.2 in$0.5 out

grok-4-fast

xAI's Grok-4 fast version, providing faster response times while maintaining powerful capabilities

128K tokensLarge Language Model (LLM)

qwen3.6-plus

Alibaba Cloud Qwen3.6-Plus general-purpose LLM with 256K tiered pricing for text generation and reasoning workloads

Large Language Model (LLM)Input ¥2 / M tokens; Output ¥12 / M tokens

$0.2 in$0.2 out

tencent/Hunyuan-MT-7B

Tencent Hunyuan machine translation model with ultra-low cost multilingual translation

Machine Translation7B

qwen3-rerank

Alibaba Qwen 3.0 text rerank model for relevance scoring and search result reordering

Rerank ModelText

qwen3-max

Alibaba Qwen 3.0 flagship model with strong Chinese capabilities and high cost-effectiveness

128K tokensLarge Language Model (LLM)

$0.15 in$1.5 out

qwen3-vl-flash

$0.03 cache hit

Alibaba Qwen 3.0 lightweight vision-language model optimized for low latency

Vision-Language Model (VLM)Text + Image

doubao-embedding-large-text

ByteDance Doubao large text embedding model, providing higher quality text vectorization capabilities

Large Text Embedding Model2048 dimensions

doubao-embedding-vision

ByteDance Doubao vision embedding model, supporting vectorization of images and multimodal content

Vision Embedding ModelImages and Multimodal

kimi-k2-thinking

Deep Reasoning Open Source

Moonshot AI's reasoning-enhanced model with interleaved thinking and tool-use capabilities, excelling at complex reasoning and agentic tasks

256K tokensReasoning-Enhanced MoE Model

$0.5 in$1.5 out

mistral-large-latest

Mistral AI's flagship MoE open-source model with 675B total parameters, multimodal capabilities and 256K context

256K tokensMoE (41B/675B)

doubao-embedding-text

ByteDance's Doubao text embedding model for text vectorization and semantic retrieval

Text Embedding Model1024 dimensions

qwen3-vl-rerank

Alibaba Qwen 3.0 multimodal rerank model for text-image retrieval reranking

Multimodal Rerank ModelText + Image

whisper-1

Powerful speech recognition model supporting multilingual transcription and translation

99+ languagesAutomatic Speech Recognition (ASR)

$0.00013/ 1K tokens

text-embedding-3-large

High-performance text embedding model for semantic search and similarity calculation

3072Text Embedding

$0.3 in$2.5 out

Google AI Studio

gemini-2.5-flash

Google AI Studio's fast multimodal model with ultra-long context support

1M tokensMultimodal Large Language Model

$0.3 in$2.5 out

Google Vertex AI

google/gemini-2.5-flash

Google Vertex AI fast multimodal model with ultra-long context support and enterprise-grade reliability

1M tokensMultimodal Large Language Model

$0.1 in$0.4 out

Google AI Studio

gemini-2.5-flash-lite

Google AI Studio's ultra-lightweight multimodal model with ultra-fast response

1M tokensLightweight Multimodal Model

$0.1 in$0.4 out

Google Vertex AI

google/gemini-2.5-flash-lite

Google Vertex AI ultra-lightweight multimodal model with ultra-fast response and enterprise deployment

1M tokensLightweight Multimodal Model

sonar

Perplexity online search model with real-time internet access

Online Search ModelReal-time Web Search

sonar-pro

Perplexity high-performance online search model with enhanced reasoning capabilities

High-Performance Search ModelAdvanced Reasoning + Real-time Search