Model Catalog

Complete specifications for every model in the workbench.

The Model Catalog is your reference for model capabilities, context windows, pricing, and feature flags. Before selecting a model in the workbench, check the catalog to confirm it supports your use case.

Catalog Structure

Each model entry contains:

Field	Description
Model ID	The exact API identifier (e.g., `claude-3-5-sonnet-20241022`)
Context Window	Maximum input + output tokens in a single request
Knowledge Cutoff	Training data date — answers about events after this date may hallucinate
Pricing	Per-million-token rates for input and output
Features	Vision, tool use, JSON mode, extended thinking, streaming
Regions	Geographic availability and latency baselines

Provider Model Families

Anthropic Claude

Claude 3.5 Sonnet: Best for coding, analysis, and complex reasoning. 200K context.
Claude 3.5 Haiku: Fast, cheap, good for classification and extraction. 200K context.
Claude 3 Opus: Highest reasoning depth. 200K context. Most expensive.

OpenAI

GPT-4o: Flagship multimodal model. 128K context. Excellent vision and tool use.
GPT-4o-mini: 80% cheaper than GPT-4o. Good for prototyping and high-volume tasks. 128K context.
o1 / o3-mini: Reasoning models. No temperature control. Best for math and logic.

Google Gemini

Gemini 1.5 Pro: 2M token context. Best for massive documents and video.
Gemini 1.5 Flash: 1M token context. Fast and cheap. Good for chat and summarization.

DeepSeek

DeepSeek V3: 671B MoE model. Extremely cheap. 64K context. Good for prototyping.

AWS Bedrock

Claude via Bedrock: Same models as Anthropic, with AWS IAM authentication and VPC isolation.
Llama 3: Open-weight models. Good for on-premise or compliance-sensitive deployments.

Feature Matrix

Feature	Claude 3.5	GPT-4o	Gemini 1.5 Pro	DeepSeek V3
Vision	Yes	Yes	Yes	No
Tool Use	Yes	Yes	Yes	No
JSON Mode	Yes	Yes	Yes	Yes
Streaming	Yes	Yes	Yes	Yes
Extended Thinking	Yes	No	No	No
1M+ Context	No	No	Yes	No

Key Takeaway

The catalog is not just a price list. Match the model's feature set to your task. Need vision? Skip DeepSeek. Need 1M context? Use Gemini. Need reasoning? Use Claude with extended thinking or OpenAI o-series.