Model Selection

Not every task needs the most powerful model. Choosing the right model for each use case saves money, reduces latency, and often produces better results because smaller models are more focused on simpler tasks.

Available Models

Model	Provider	Context Window	Best For
GPT-4.1	Azure OpenAI	128k tokens	Complex reasoning, long documents, investigation narratives, proposals
GPT-4.1-mini	Azure OpenAI	128k tokens	High-volume operations (ticket triage, summaries, suggestions), cost-sensitive features
GPT-4.1-nano	Azure OpenAI	32k tokens	Ultra-low cost operations, simple classification, short text extraction
GPT-4o	Azure OpenAI	128k tokens	Multimodal (when images are relevant), call recaps, nuanced content
GPT-4o-mini	Azure OpenAI	128k tokens	Low-cost alternative to GPT-4o for simpler tasks
Claude Opus	Anthropic	200k tokens	Deep code analysis, large document processing, complex reasoning with very long context
Claude Sonnet	Azure AI Foundry	200k tokens	Code review, structured output, nuanced writing tasks

Choosing the Right Model

By Task Complexity

Task	Recommended Model	Reason
Ticket field suggestions	GPT-4.1-mini	Fast, cheap, good enough for short suggestions
Auto-triage and priority	GPT-4.1-mini	Pattern classification — no deep reasoning needed
RCA report generation	GPT-4.1	Long output, technical accuracy required
QBR summary	GPT-4.1	Professional document, client-facing quality
Quick KB article from notes	GPT-4.1-mini	Structured output from provided notes — minimal creativity
Proposal drafting	GPT-4.1	Long, persuasive document requiring good writing
Attack narrative	GPT-4.1	Complex threat chain analysis with large context window
Code review	Claude Sonnet	Strong code understanding, structured critique
Call recap	GPT-4o	Handles nuanced speech patterns well
Embedding / semantic search	text-embedding-3-large	Not a chat model — specifically for vector embeddings

By Cost Sensitivity

If cost is a primary concern, use the cheapest model that produces acceptable quality:

Budget priority	Strategy
Minimize cost	Default to GPT-4.1-mini for all background operations
Balance cost and quality	GPT-4.1-mini for triage/suggestions; GPT-4.1 for client-facing output
Maximize quality	GPT-4.1 for all features

GPT-4.1-mini costs approximately 5x less than GPT-4.1 for the same token volume. For high-frequency operations like ticket suggestions (potentially thousands per day across an MSP), this difference is significant.

By Output Length

Expected output length	Recommended model
< 100 tokens (short answer, classification)	GPT-4.1-nano or GPT-4.1-mini
100–500 tokens (summaries, emails)	GPT-4.1-mini
500–2000 tokens (reports, proposals)	GPT-4.1
2000+ tokens (large documents, deep analysis)	GPT-4.1 or Claude Opus

By Latency Requirements

Latency requirement	Best model
Real-time (< 2s response)	GPT-4.1-mini or GPT-4.1-nano
Interactive (< 5s response)	GPT-4.1-mini
Background processing (no latency constraint)	Any model

Selecting a Model in the Playground

In the Playground, use the Model dropdown in the conversation header to choose which model handles your session. The dropdown shows all models available on your tier.

ℹ️Model selection in the Playground applies only to the Playground. Jarvis, Custom GPTs, and product features use their configured routes — they do not inherit your Playground model selection.

Model Comparison for Key Tasks

Ticket Suggestions (PSA)

Model	Response quality	Latency	Cost per 1000 requests
GPT-4.1-nano	Good for simple tickets	~0.3s	~$0.05
GPT-4.1-mini	Better for complex tickets	~0.5s	~$0.20
GPT-4.1	Best, overkill for most tickets	~1.2s	~$1.00

Recommendation: GPT-4.1-mini is the default and usually the right choice. Move to GPT-4.1 only if technicians report the suggestions are frequently wrong.

Proposal Generation

Model	Output quality	Avg output length	Cost per proposal
GPT-4.1-mini	Acceptable, may miss nuance	~600 tokens	~$0.12
GPT-4.1	Professional, well-structured	~1200 tokens	~$0.96

Recommendation: GPT-4.1 for client-facing proposals. The cost difference per document is small, and quality matters.

Code Review (Code Platform)

Model	Accuracy	Notes
Claude Sonnet	Excellent	Strong code understanding, structured critique format
GPT-4.1	Very good	Slightly less precise on edge cases
GPT-4.1-mini	Acceptable	Misses subtle bugs in complex code

Recommendation: Claude Sonnet is the platform default for code review.

Temperature Settings

Temperature controls how creative vs. deterministic the model's responses are:

Temperature	Behavior	Use for
0.0 – 0.3	Very deterministic, consistent	Triage classification, data extraction, structured output
0.4 – 0.7	Balanced	Summaries, reports, documentation
0.8 – 1.2	More creative	Proposals, marketing copy, brainstorming
1.3 – 2.0	Highly creative, potentially inconsistent	Creative writing only — avoid for business content

The platform's default temperature is 0.7 for most features. Jarvis uses 0.6. Code review uses 0.2.

Overriding Models (Enterprise)

Enterprise tier allows per-route model overrides via Settings → Feature Routing. See Feature Routing for details.

Standard tier users can override the model in the Playground only. All other features use platform defaults.

Available Models​

Choosing the Right Model​

By Task Complexity​

By Cost Sensitivity​

By Output Length​

By Latency Requirements​

Selecting a Model in the Playground​

Model Comparison for Key Tasks​

Ticket Suggestions (PSA)​

Proposal Generation​

Code Review (Code Platform)​

Temperature Settings​

Overriding Models (Enterprise)​

Available Models

Choosing the Right Model

By Task Complexity

By Cost Sensitivity

By Output Length

By Latency Requirements

Selecting a Model in the Playground

Model Comparison for Key Tasks

Ticket Suggestions (PSA)

Proposal Generation

Code Review (Code Platform)

Temperature Settings

Overriding Models (Enterprise)