Skip to main content

AI Gateway

The AI Gateway is the governance and routing layer that sits between every AI request — whether from your technicians, a product like Defend or Voice, or your own integrations — and the underlying AI providers. It enforces policies, enforces quotas, scrubs PII, and ensures every request is audited.

What the Gateway Does

Every AI call in The One Stack goes through the Gateway. No product makes a direct provider call. The Gateway:

  1. Checks tenant access and feature allowlists
  2. Estimates token usage and checks against quota limits
  3. Scrubs PII (configurable per feature: scrub, warn, or allow)
  4. Looks up the versioned system prompt for the requested feature
  5. Routes to the assigned provider and model (via feature routing config)
  6. Handles failover if the primary provider's circuit breaker is open
  7. Streams or returns the response
  8. Records the usage and audit event asynchronously

Providers

The Gateway supports three AI providers:

ProviderModelsNotes
Azure OpenAIGPT-4.1, GPT-4.1-mini, GPT-4.1-nano, GPT-4o, GPT-4o-miniPrimary provider. Authenticates via Managed Identity in production.
Azure AI FoundryClaude modelsSecondary provider. Authenticates via Managed Identity in production.
Anthropic DirectClaude Opus, Claude SonnetTertiary provider. Requires API key. Used when Foundry quota is exhausted.
ℹ️All provider credentials are stored in Azure Key Vault and injected via environment variables at startup. No credentials exist in source code or configuration files.

Provider Health and Circuit Breakers

The Gateway maintains per-provider health state in Redis. If a provider returns errors above a threshold, its circuit breaker opens and requests automatically fail over to the next available provider.

Circuit breaker states:

  • Closed — Provider is healthy; all requests route normally
  • Open — Provider is degraded; requests bypass to failover
  • Half-open — Test requests are sent to check recovery

You can view current provider health at Settings → AI Gateway in the platform.

Quota Enforcement

Each tenant has a quota configuration applied at the Gateway level:

Quota TypeEnforcement
Daily token soft limitWarning returned in response metadata; requests continue
Daily token hard limitRequests blocked with 429 Quota Exceeded
Monthly budget (USD)Requests blocked when month-to-date cost exceeds cap
Feature allowlistRequests for disabled features blocked with 403 Feature Not Enabled

Quotas are stored in Redis for sub-millisecond enforcement. The metering system writes actual usage back to Cosmos DB every 60 seconds via a background flush.

PII Scrubbing

Before sending a prompt to any provider, the Gateway can scrub sensitive data. This is configurable per feature:

ModeBehavior
scrubPII is redacted before the prompt leaves your tenant. The provider never sees it.
warnPII is detected and logged, but the prompt is sent as-is. You receive a warning in the response.
allowNo PII scanning. Use only for features where PII is necessary (e.g., client portal assistants).
⚠️PII scrubbing is recommended for any feature that processes customer data. The default is warn for new features and scrub for features that involve ticket data, contact records, or financial information.

Usage Buffering and Audit Trail

To avoid blocking AI responses with database writes, the Gateway records usage events using a fire-and-forget pattern:

  1. Usage event is pushed to a Redis buffer (ai:buffer:{platform}:usage)
  2. AI response is returned to the caller immediately
  3. A background timer in Ops Center flushes Redis → Cosmos DB every 60 seconds
  4. Aggregation timers compute daily summaries for the Usage Analytics dashboard

Every AI call also generates an immutable audit record (ai:buffer:{platform}:audit) that captures the actor, feature, model, token count, cost, and timestamp. Audit records cannot be deleted and are available for compliance evidence export.

Gateway Architecture (for Developers)

The Gateway is the @theone/ai-gateway package, available to all The One platforms:

import { AIGateway } from '@theone/ai-gateway';

const result = await gateway.chat({
feature: 'ticket-suggest',
messages: [{ role: 'user', content: userPrompt }],
tenant_id: tenantId,
});

The feature string determines:

  • Which provider and model to use (via feature routing config in the platform)
  • Which PII scrubbing mode to apply
  • Which system prompt version to inject
  • Which quota bucket to charge

See Feature Routing for the full feature-to-model mapping.

What Happens When a Request Is Blocked

If the Gateway blocks a request, the calling product receives a structured error:

CodeReasonResolution
429 Quota ExceededDaily or monthly limit hitUpgrade tier or wait for quota reset
403 Feature Not EnabledFeature not in your allowlistContact support to enable the feature
503 No Providers AvailableAll circuit breakers openTransient outage; retry after 30 seconds
400 PII DetectedScrub mode blocked PIIRemove sensitive data from the prompt

Monitoring Gateway Health

In the platform, navigate to Settings → AI Gateway to view:

  • Current provider health status (Closed/Open/Half-open for each provider)
  • Requests in the last 24 hours by provider
  • Error rate by provider
  • Average latency by provider
  • Current circuit breaker thresholds

Ops Center also displays a real-time Gateway health indicator visible to platform administrators across all tenants.