Skip to main content

Architecture

How the QuilrAI LLM Gateway processes every request - from your application to the LLM provider and back.

Your Application
client = OpenAI(
  base_url='https://guardrails.quilr.ai/openai_compatible/',
  api_key='sk-quilr-xxx'
)
client.chat.completions.create(
  model='gpt-4o',
  messages=[{'role': 'user', 'content': 'Hello!'}]
)
QuilrAI LLM Gateway
Validate
Identity & Auth
JWT / header validation
Domain allowlist
Per-user tracking
Rate Limits
Req/min, hr, day limits
Token budgets
Key expiration
Scan
PII / PHI / PCI
Contextual detection
Exact data matching
Block / redact / anonymize
Adversarial Detection
Prompt injection
Jailbreak detection
Social engineering
Custom Intents
User-defined categories
Example-trained classifier
Transform
Prompt Store
Centralized prompts
Template variables
Enforce prompt-only mode
Token Saving
JSON compression
HTML/MD stripping
Input-only, same accuracy
Route
Request Routing
Weighted load balancing
Automatic failover
Multi-provider groups
Logging · Cost Tracking · Analytics · Red Team Testing
LLM Providers
OpenAIAnthropicAzure OpenAIAWS BedrockVertex AICustom Endpoints
QuilrAI

Pipeline Stages

Every API request flows through these stages in order. Each stage is independently configurable per API key from the dashboard.

StageDescriptionDetails
Identity & AuthValidates request identity via JWT, JWKS, or header. Enforces domain restrictions.Identity Aware →
Rate LimitsEnforces request rates, token budgets, and key expiration before reaching the provider.Rate Limits →
Security GuardrailsDetects PII, PHI, PCI, and financial data. Catches prompt injection, jailbreak, and social engineering.Security Guardrails →
Custom IntentsUser-defined detection categories trained with positive and negative examples.Custom Intents →
Prompt StoreResolves centralized system prompts by ID with template variable substitution.Prompt Store →
Token SavingCompresses input tokens - JSON to TOON, HTML/Markdown to plain text. Responses unchanged.Token Saving →
Request RoutingRoutes to the optimal provider using weighted load balancing with automatic failover.Request Routing →

Response Path

Responses from the LLM provider pass back through the security guardrails for output scanning before being returned to your application. The same detection categories and configurable actions (block, redact, anonymize, monitor) apply to both requests and responses.

Observability

Every request is logged with cost, latency, token counts, and guardrail actions. Use the Logs tab to review request history and the Red Team Testing tool to validate your guardrail configuration against adversarial prompts.