AI Gateway

The AI Gateway is the governance and routing layer that sits between every AI request — whether from your technicians, a product like Defend or Voice, or your own integrations — and the underlying AI providers. It enforces policies, enforces quotas, scrubs PII, and ensures every request is audited.

What the Gateway Does

Every AI call in The One Stack goes through the Gateway. No product makes a direct provider call. The Gateway:

Checks tenant access and feature allowlists
Estimates token usage and checks against quota limits
Scrubs PII (configurable per feature: scrub, warn, or allow)
Looks up the versioned system prompt for the requested feature
Routes to the assigned provider and model (via feature routing config)
Handles failover if the primary provider's circuit breaker is open
Streams or returns the response
Records the usage and audit event asynchronously

Providers

The Gateway supports three AI providers:

Provider	Models	Notes
Azure OpenAI	GPT-4.1, GPT-4.1-mini, GPT-4.1-nano, GPT-4o, GPT-4o-mini	Primary provider. Authenticates via Managed Identity in production.
Azure AI Foundry	Claude models	Secondary provider. Authenticates via Managed Identity in production.
Anthropic Direct	Claude Opus, Claude Sonnet	Tertiary provider. Requires API key. Used when Foundry quota is exhausted.

ℹ️All provider credentials are stored in Azure Key Vault and injected via environment variables at startup. No credentials exist in source code or configuration files.

Provider Health and Circuit Breakers

The Gateway maintains per-provider health state in Redis. If a provider returns errors above a threshold, its circuit breaker opens and requests automatically fail over to the next available provider.

Circuit breaker states:

Closed — Provider is healthy; all requests route normally
Open — Provider is degraded; requests bypass to failover
Half-open — Test requests are sent to check recovery

You can view current provider health at Settings → AI Gateway in the platform.

Quota Enforcement

Each tenant has a quota configuration applied at the Gateway level:

Quota Type	Enforcement
Daily token soft limit	Warning returned in response metadata; requests continue
Daily token hard limit	Requests blocked with `429 Quota Exceeded`
Monthly budget (USD)	Requests blocked when month-to-date cost exceeds cap
Feature allowlist	Requests for disabled features blocked with `403 Feature Not Enabled`

Quotas are stored in Redis for sub-millisecond enforcement. The metering system writes actual usage back to Cosmos DB every 60 seconds via a background flush.

PII Scrubbing

Before sending a prompt to any provider, the Gateway can scrub sensitive data. This is configurable per feature:

Mode	Behavior
`scrub`	PII is redacted before the prompt leaves your tenant. The provider never sees it.
`warn`	PII is detected and logged, but the prompt is sent as-is. You receive a warning in the response.
`allow`	No PII scanning. Use only for features where PII is necessary (e.g., client portal assistants).

⚠️PII scrubbing is recommended for any feature that processes customer data. The default is warn for new features and scrub for features that involve ticket data, contact records, or financial information.

Usage Buffering and Audit Trail

To avoid blocking AI responses with database writes, the Gateway records usage events using a fire-and-forget pattern:

Usage event is pushed to a Redis buffer (ai:buffer:{platform}:usage)
AI response is returned to the caller immediately
A background timer in Ops Center flushes Redis → Cosmos DB every 60 seconds
Aggregation timers compute daily summaries for the Usage Analytics dashboard

Every AI call also generates an immutable audit record (ai:buffer:{platform}:audit) that captures the actor, feature, model, token count, cost, and timestamp. Audit records cannot be deleted and are available for compliance evidence export.

Gateway Architecture (for Developers)

The Gateway is the @theone/ai-gateway package, available to all The One platforms:

import { AIGateway } from '@theone/ai-gateway';

const result = await gateway.chat({
  feature: 'ticket-suggest',
  messages: [{ role: 'user', content: userPrompt }],
  tenant_id: tenantId,
});

The feature string determines:

Which provider and model to use (via feature routing config in the platform)
Which PII scrubbing mode to apply
Which system prompt version to inject
Which quota bucket to charge

See Feature Routing for the full feature-to-model mapping.

What Happens When a Request Is Blocked

If the Gateway blocks a request, the calling product receives a structured error:

Code	Reason	Resolution
`429 Quota Exceeded`	Daily or monthly limit hit	Upgrade tier or wait for quota reset
`403 Feature Not Enabled`	Feature not in your allowlist	Contact support to enable the feature
`503 No Providers Available`	All circuit breakers open	Transient outage; retry after 30 seconds
`400 PII Detected`	Scrub mode blocked PII	Remove sensitive data from the prompt

Monitoring Gateway Health

In the platform, navigate to Settings → AI Gateway to view:

Current provider health status (Closed/Open/Half-open for each provider)
Requests in the last 24 hours by provider
Error rate by provider
Average latency by provider
Current circuit breaker thresholds

Ops Center also displays a real-time Gateway health indicator visible to platform administrators across all tenants.

What the Gateway Does​

Providers​

Provider Health and Circuit Breakers​

Quota Enforcement​

PII Scrubbing​

Usage Buffering and Audit Trail​

Gateway Architecture (for Developers)​

What Happens When a Request Is Blocked​

Monitoring Gateway Health​

What the Gateway Does

Providers

Provider Health and Circuit Breakers

Quota Enforcement

PII Scrubbing

Usage Buffering and Audit Trail

Gateway Architecture (for Developers)

What Happens When a Request Is Blocked

Monitoring Gateway Health