Model Catalog
Complete specifications for every model in the workbench.
The Model Catalog is your reference for model capabilities, context windows, pricing, and feature flags. Before selecting a model in the workbench, check the catalog to confirm it supports your use case.
Catalog Structure
Each model entry contains:
| Field | Description |
|---|---|
| Model ID | The exact API identifier (e.g., claude-3-5-sonnet-20241022) |
| Context Window | Maximum input + output tokens in a single request |
| Knowledge Cutoff | Training data date — answers about events after this date may hallucinate |
| Pricing | Per-million-token rates for input and output |
| Features | Vision, tool use, JSON mode, extended thinking, streaming |
| Regions | Geographic availability and latency baselines |
Provider Model Families
Anthropic Claude
- Claude 3.5 Sonnet: Best for coding, analysis, and complex reasoning. 200K context.
- Claude 3.5 Haiku: Fast, cheap, good for classification and extraction. 200K context.
- Claude 3 Opus: Highest reasoning depth. 200K context. Most expensive.
OpenAI
- GPT-4o: Flagship multimodal model. 128K context. Excellent vision and tool use.
- GPT-4o-mini: 80% cheaper than GPT-4o. Good for prototyping and high-volume tasks. 128K context.
- o1 / o3-mini: Reasoning models. No temperature control. Best for math and logic.
Google Gemini
- Gemini 1.5 Pro: 2M token context. Best for massive documents and video.
- Gemini 1.5 Flash: 1M token context. Fast and cheap. Good for chat and summarization.
DeepSeek
- DeepSeek V3: 671B MoE model. Extremely cheap. 64K context. Good for prototyping.
AWS Bedrock
- Claude via Bedrock: Same models as Anthropic, with AWS IAM authentication and VPC isolation.
- Llama 3: Open-weight models. Good for on-premise or compliance-sensitive deployments.
Feature Matrix
| Feature | Claude 3.5 | GPT-4o | Gemini 1.5 Pro | DeepSeek V3 |
|---|---|---|---|---|
| Vision | Yes | Yes | Yes | No |
| Tool Use | Yes | Yes | Yes | No |
| JSON Mode | Yes | Yes | Yes | Yes |
| Streaming | Yes | Yes | Yes | Yes |
| Extended Thinking | Yes | No | No | No |
| 1M+ Context | No | No | Yes | No |
Key Takeaway
The catalog is not just a price list. Match the model's feature set to your task. Need vision? Skip DeepSeek. Need 1M context? Use Gemini. Need reasoning? Use Claude with extended thinking or OpenAI o-series.