Model Catalog

Complete specifications for every model in the workbench.

The Model Catalog is your reference for model capabilities, context windows, pricing, and feature flags. Before selecting a model in the workbench, check the catalog to confirm it supports your use case.

Catalog Structure

Each model entry contains:

FieldDescription
Model IDThe exact API identifier (e.g., claude-3-5-sonnet-20241022)
Context WindowMaximum input + output tokens in a single request
Knowledge CutoffTraining data date — answers about events after this date may hallucinate
PricingPer-million-token rates for input and output
FeaturesVision, tool use, JSON mode, extended thinking, streaming
RegionsGeographic availability and latency baselines

Provider Model Families

Anthropic Claude

  • Claude 3.5 Sonnet: Best for coding, analysis, and complex reasoning. 200K context.
  • Claude 3.5 Haiku: Fast, cheap, good for classification and extraction. 200K context.
  • Claude 3 Opus: Highest reasoning depth. 200K context. Most expensive.

OpenAI

  • GPT-4o: Flagship multimodal model. 128K context. Excellent vision and tool use.
  • GPT-4o-mini: 80% cheaper than GPT-4o. Good for prototyping and high-volume tasks. 128K context.
  • o1 / o3-mini: Reasoning models. No temperature control. Best for math and logic.

Google Gemini

  • Gemini 1.5 Pro: 2M token context. Best for massive documents and video.
  • Gemini 1.5 Flash: 1M token context. Fast and cheap. Good for chat and summarization.

DeepSeek

  • DeepSeek V3: 671B MoE model. Extremely cheap. 64K context. Good for prototyping.

AWS Bedrock

  • Claude via Bedrock: Same models as Anthropic, with AWS IAM authentication and VPC isolation.
  • Llama 3: Open-weight models. Good for on-premise or compliance-sensitive deployments.

Feature Matrix

FeatureClaude 3.5GPT-4oGemini 1.5 ProDeepSeek V3
VisionYesYesYesNo
Tool UseYesYesYesNo
JSON ModeYesYesYesYes
StreamingYesYesYesYes
Extended ThinkingYesNoNoNo
1M+ ContextNoNoYesNo

Key Takeaway

The catalog is not just a price list. Match the model's feature set to your task. Need vision? Skip DeepSeek. Need 1M context? Use Gemini. Need reasoning? Use Claude with extended thinking or OpenAI o-series.