Skip to main content

Model Selection

Not every task needs the most powerful model. Choosing the right model for each use case saves money, reduces latency, and often produces better results because smaller models are more focused on simpler tasks.

Available Models

ModelProviderContext WindowBest For
GPT-4.1Azure OpenAI128k tokensComplex reasoning, long documents, investigation narratives, proposals
GPT-4.1-miniAzure OpenAI128k tokensHigh-volume operations (ticket triage, summaries, suggestions), cost-sensitive features
GPT-4.1-nanoAzure OpenAI32k tokensUltra-low cost operations, simple classification, short text extraction
GPT-4oAzure OpenAI128k tokensMultimodal (when images are relevant), call recaps, nuanced content
GPT-4o-miniAzure OpenAI128k tokensLow-cost alternative to GPT-4o for simpler tasks
Claude OpusAnthropic200k tokensDeep code analysis, large document processing, complex reasoning with very long context
Claude SonnetAzure AI Foundry200k tokensCode review, structured output, nuanced writing tasks

Choosing the Right Model

By Task Complexity

TaskRecommended ModelReason
Ticket field suggestionsGPT-4.1-miniFast, cheap, good enough for short suggestions
Auto-triage and priorityGPT-4.1-miniPattern classification — no deep reasoning needed
RCA report generationGPT-4.1Long output, technical accuracy required
QBR summaryGPT-4.1Professional document, client-facing quality
Quick KB article from notesGPT-4.1-miniStructured output from provided notes — minimal creativity
Proposal draftingGPT-4.1Long, persuasive document requiring good writing
Attack narrativeGPT-4.1Complex threat chain analysis with large context window
Code reviewClaude SonnetStrong code understanding, structured critique
Call recapGPT-4oHandles nuanced speech patterns well
Embedding / semantic searchtext-embedding-3-largeNot a chat model — specifically for vector embeddings

By Cost Sensitivity

If cost is a primary concern, use the cheapest model that produces acceptable quality:

Budget priorityStrategy
Minimize costDefault to GPT-4.1-mini for all background operations
Balance cost and qualityGPT-4.1-mini for triage/suggestions; GPT-4.1 for client-facing output
Maximize qualityGPT-4.1 for all features

GPT-4.1-mini costs approximately 5x less than GPT-4.1 for the same token volume. For high-frequency operations like ticket suggestions (potentially thousands per day across an MSP), this difference is significant.

By Output Length

Expected output lengthRecommended model
< 100 tokens (short answer, classification)GPT-4.1-nano or GPT-4.1-mini
100–500 tokens (summaries, emails)GPT-4.1-mini
500–2000 tokens (reports, proposals)GPT-4.1
2000+ tokens (large documents, deep analysis)GPT-4.1 or Claude Opus

By Latency Requirements

Latency requirementBest model
Real-time (< 2s response)GPT-4.1-mini or GPT-4.1-nano
Interactive (< 5s response)GPT-4.1-mini
Background processing (no latency constraint)Any model

Selecting a Model in the Playground

In the Playground, use the Model dropdown in the conversation header to choose which model handles your session. The dropdown shows all models available on your tier.

ℹ️Model selection in the Playground applies only to the Playground. Jarvis, Custom GPTs, and product features use their configured routes — they do not inherit your Playground model selection.

Model Comparison for Key Tasks

Ticket Suggestions (PSA)

ModelResponse qualityLatencyCost per 1000 requests
GPT-4.1-nanoGood for simple tickets~0.3s~$0.05
GPT-4.1-miniBetter for complex tickets~0.5s~$0.20
GPT-4.1Best, overkill for most tickets~1.2s~$1.00

Recommendation: GPT-4.1-mini is the default and usually the right choice. Move to GPT-4.1 only if technicians report the suggestions are frequently wrong.

Proposal Generation

ModelOutput qualityAvg output lengthCost per proposal
GPT-4.1-miniAcceptable, may miss nuance~600 tokens~$0.12
GPT-4.1Professional, well-structured~1200 tokens~$0.96

Recommendation: GPT-4.1 for client-facing proposals. The cost difference per document is small, and quality matters.

Code Review (Code Platform)

ModelAccuracyNotes
Claude SonnetExcellentStrong code understanding, structured critique format
GPT-4.1Very goodSlightly less precise on edge cases
GPT-4.1-miniAcceptableMisses subtle bugs in complex code

Recommendation: Claude Sonnet is the platform default for code review.

Temperature Settings

Temperature controls how creative vs. deterministic the model's responses are:

TemperatureBehaviorUse for
0.0 – 0.3Very deterministic, consistentTriage classification, data extraction, structured output
0.4 – 0.7BalancedSummaries, reports, documentation
0.8 – 1.2More creativeProposals, marketing copy, brainstorming
1.3 – 2.0Highly creative, potentially inconsistentCreative writing only — avoid for business content

The platform's default temperature is 0.7 for most features. Jarvis uses 0.6. Code review uses 0.2.

Overriding Models (Enterprise)

Enterprise tier allows per-route model overrides via Settings → Feature Routing. See Feature Routing for details.

Standard tier users can override the model in the Playground only. All other features use platform defaults.