AI Gateway
The AI Gateway is the governance and routing layer that sits between every AI request — whether from your technicians, a product like Defend or Voice, or your own integrations — and the underlying AI providers. It enforces policies, enforces quotas, scrubs PII, and ensures every request is audited.
What the Gateway Does
Every AI call in The One Stack goes through the Gateway. No product makes a direct provider call. The Gateway:
- Checks tenant access and feature allowlists
- Estimates token usage and checks against quota limits
- Scrubs PII (configurable per feature: scrub, warn, or allow)
- Looks up the versioned system prompt for the requested feature
- Routes to the assigned provider and model (via feature routing config)
- Handles failover if the primary provider's circuit breaker is open
- Streams or returns the response
- Records the usage and audit event asynchronously
Providers
The Gateway supports three AI providers:
| Provider | Models | Notes |
|---|---|---|
| Azure OpenAI | GPT-4.1, GPT-4.1-mini, GPT-4.1-nano, GPT-4o, GPT-4o-mini | Primary provider. Authenticates via Managed Identity in production. |
| Azure AI Foundry | Claude models | Secondary provider. Authenticates via Managed Identity in production. |
| Anthropic Direct | Claude Opus, Claude Sonnet | Tertiary provider. Requires API key. Used when Foundry quota is exhausted. |
Provider Health and Circuit Breakers
The Gateway maintains per-provider health state in Redis. If a provider returns errors above a threshold, its circuit breaker opens and requests automatically fail over to the next available provider.
Circuit breaker states:
- Closed — Provider is healthy; all requests route normally
- Open — Provider is degraded; requests bypass to failover
- Half-open — Test requests are sent to check recovery
You can view current provider health at Settings → AI Gateway in the platform.
Quota Enforcement
Each tenant has a quota configuration applied at the Gateway level:
| Quota Type | Enforcement |
|---|---|
| Daily token soft limit | Warning returned in response metadata; requests continue |
| Daily token hard limit | Requests blocked with 429 Quota Exceeded |
| Monthly budget (USD) | Requests blocked when month-to-date cost exceeds cap |
| Feature allowlist | Requests for disabled features blocked with 403 Feature Not Enabled |
Quotas are stored in Redis for sub-millisecond enforcement. The metering system writes actual usage back to Cosmos DB every 60 seconds via a background flush.
PII Scrubbing
Before sending a prompt to any provider, the Gateway can scrub sensitive data. This is configurable per feature:
| Mode | Behavior |
|---|---|
scrub | PII is redacted before the prompt leaves your tenant. The provider never sees it. |
warn | PII is detected and logged, but the prompt is sent as-is. You receive a warning in the response. |
allow | No PII scanning. Use only for features where PII is necessary (e.g., client portal assistants). |
warn for new features and scrub for features that involve ticket data, contact records, or financial information.Usage Buffering and Audit Trail
To avoid blocking AI responses with database writes, the Gateway records usage events using a fire-and-forget pattern:
- Usage event is pushed to a Redis buffer (
ai:buffer:{platform}:usage) - AI response is returned to the caller immediately
- A background timer in Ops Center flushes Redis → Cosmos DB every 60 seconds
- Aggregation timers compute daily summaries for the Usage Analytics dashboard
Every AI call also generates an immutable audit record (ai:buffer:{platform}:audit) that captures the actor, feature, model, token count, cost, and timestamp. Audit records cannot be deleted and are available for compliance evidence export.
Gateway Architecture (for Developers)
The Gateway is the @theone/ai-gateway package, available to all The One platforms:
import { AIGateway } from '@theone/ai-gateway';
const result = await gateway.chat({
feature: 'ticket-suggest',
messages: [{ role: 'user', content: userPrompt }],
tenant_id: tenantId,
});
The feature string determines:
- Which provider and model to use (via feature routing config in the platform)
- Which PII scrubbing mode to apply
- Which system prompt version to inject
- Which quota bucket to charge
See Feature Routing for the full feature-to-model mapping.
What Happens When a Request Is Blocked
If the Gateway blocks a request, the calling product receives a structured error:
| Code | Reason | Resolution |
|---|---|---|
429 Quota Exceeded | Daily or monthly limit hit | Upgrade tier or wait for quota reset |
403 Feature Not Enabled | Feature not in your allowlist | Contact support to enable the feature |
503 No Providers Available | All circuit breakers open | Transient outage; retry after 30 seconds |
400 PII Detected | Scrub mode blocked PII | Remove sensitive data from the prompt |
Monitoring Gateway Health
In the platform, navigate to Settings → AI Gateway to view:
- Current provider health status (Closed/Open/Half-open for each provider)
- Requests in the last 24 hours by provider
- Error rate by provider
- Average latency by provider
- Current circuit breaker thresholds
Ops Center also displays a real-time Gateway health indicator visible to platform administrators across all tenants.