Model Selection
Not every task needs the most powerful model. Choosing the right model for each use case saves money, reduces latency, and often produces better results because smaller models are more focused on simpler tasks.
Available Models
| Model | Provider | Context Window | Best For |
|---|---|---|---|
| GPT-4.1 | Azure OpenAI | 128k tokens | Complex reasoning, long documents, investigation narratives, proposals |
| GPT-4.1-mini | Azure OpenAI | 128k tokens | High-volume operations (ticket triage, summaries, suggestions), cost-sensitive features |
| GPT-4.1-nano | Azure OpenAI | 32k tokens | Ultra-low cost operations, simple classification, short text extraction |
| GPT-4o | Azure OpenAI | 128k tokens | Multimodal (when images are relevant), call recaps, nuanced content |
| GPT-4o-mini | Azure OpenAI | 128k tokens | Low-cost alternative to GPT-4o for simpler tasks |
| Claude Opus | Anthropic | 200k tokens | Deep code analysis, large document processing, complex reasoning with very long context |
| Claude Sonnet | Azure AI Foundry | 200k tokens | Code review, structured output, nuanced writing tasks |
Choosing the Right Model
By Task Complexity
| Task | Recommended Model | Reason |
|---|---|---|
| Ticket field suggestions | GPT-4.1-mini | Fast, cheap, good enough for short suggestions |
| Auto-triage and priority | GPT-4.1-mini | Pattern classification — no deep reasoning needed |
| RCA report generation | GPT-4.1 | Long output, technical accuracy required |
| QBR summary | GPT-4.1 | Professional document, client-facing quality |
| Quick KB article from notes | GPT-4.1-mini | Structured output from provided notes — minimal creativity |
| Proposal drafting | GPT-4.1 | Long, persuasive document requiring good writing |
| Attack narrative | GPT-4.1 | Complex threat chain analysis with large context window |
| Code review | Claude Sonnet | Strong code understanding, structured critique |
| Call recap | GPT-4o | Handles nuanced speech patterns well |
| Embedding / semantic search | text-embedding-3-large | Not a chat model — specifically for vector embeddings |
By Cost Sensitivity
If cost is a primary concern, use the cheapest model that produces acceptable quality:
| Budget priority | Strategy |
|---|---|
| Minimize cost | Default to GPT-4.1-mini for all background operations |
| Balance cost and quality | GPT-4.1-mini for triage/suggestions; GPT-4.1 for client-facing output |
| Maximize quality | GPT-4.1 for all features |
GPT-4.1-mini costs approximately 5x less than GPT-4.1 for the same token volume. For high-frequency operations like ticket suggestions (potentially thousands per day across an MSP), this difference is significant.
By Output Length
| Expected output length | Recommended model |
|---|---|
| < 100 tokens (short answer, classification) | GPT-4.1-nano or GPT-4.1-mini |
| 100–500 tokens (summaries, emails) | GPT-4.1-mini |
| 500–2000 tokens (reports, proposals) | GPT-4.1 |
| 2000+ tokens (large documents, deep analysis) | GPT-4.1 or Claude Opus |
By Latency Requirements
| Latency requirement | Best model |
|---|---|
| Real-time (< 2s response) | GPT-4.1-mini or GPT-4.1-nano |
| Interactive (< 5s response) | GPT-4.1-mini |
| Background processing (no latency constraint) | Any model |
Selecting a Model in the Playground
In the Playground, use the Model dropdown in the conversation header to choose which model handles your session. The dropdown shows all models available on your tier.
Model Comparison for Key Tasks
Ticket Suggestions (PSA)
| Model | Response quality | Latency | Cost per 1000 requests |
|---|---|---|---|
| GPT-4.1-nano | Good for simple tickets | ~0.3s | ~$0.05 |
| GPT-4.1-mini | Better for complex tickets | ~0.5s | ~$0.20 |
| GPT-4.1 | Best, overkill for most tickets | ~1.2s | ~$1.00 |
Recommendation: GPT-4.1-mini is the default and usually the right choice. Move to GPT-4.1 only if technicians report the suggestions are frequently wrong.
Proposal Generation
| Model | Output quality | Avg output length | Cost per proposal |
|---|---|---|---|
| GPT-4.1-mini | Acceptable, may miss nuance | ~600 tokens | ~$0.12 |
| GPT-4.1 | Professional, well-structured | ~1200 tokens | ~$0.96 |
Recommendation: GPT-4.1 for client-facing proposals. The cost difference per document is small, and quality matters.
Code Review (Code Platform)
| Model | Accuracy | Notes |
|---|---|---|
| Claude Sonnet | Excellent | Strong code understanding, structured critique format |
| GPT-4.1 | Very good | Slightly less precise on edge cases |
| GPT-4.1-mini | Acceptable | Misses subtle bugs in complex code |
Recommendation: Claude Sonnet is the platform default for code review.
Temperature Settings
Temperature controls how creative vs. deterministic the model's responses are:
| Temperature | Behavior | Use for |
|---|---|---|
| 0.0 – 0.3 | Very deterministic, consistent | Triage classification, data extraction, structured output |
| 0.4 – 0.7 | Balanced | Summaries, reports, documentation |
| 0.8 – 1.2 | More creative | Proposals, marketing copy, brainstorming |
| 1.3 – 2.0 | Highly creative, potentially inconsistent | Creative writing only — avoid for business content |
The platform's default temperature is 0.7 for most features. Jarvis uses 0.6. Code review uses 0.2.
Overriding Models (Enterprise)
Enterprise tier allows per-route model overrides via Settings → Feature Routing. See Feature Routing for details.
Standard tier users can override the model in the Playground only. All other features use platform defaults.