Architecture
How the QuilrAI LLM Gateway processes every request - from your application to the LLM provider and back.
client = OpenAI(
base_url='https://guardrails-usa-2.quilr.ai/openai_compatible/',
api_key='sk-quilr-xxx'
)
client.chat.completions.create(
model='gpt-4o',
messages=[{'role': 'user', 'content': 'Hello!'}]
)Pipeline Stages
Every API request flows through these stages in order. Each stage is independently configurable per API key from the dashboard.
Response Path
Responses from the LLM provider pass back through the security guardrails for output scanning before being returned to your application. The same detection categories and configurable actions (block, redact, anonymize, monitor) apply to both requests and responses. When Guardian Agent coding helpers are enabled, non-streaming responses can also be reviewed for dependency vulnerabilities and outdated exact pins before final delivery.
Non-streaming chat completions, including Bedrock models reached through OpenAI-compatible chat via Converse, Anthropic Messages, AWS Bedrock Runtime boto3 converse / supported invoke_model, Vertex/Gemini generateContent, and the OpenAI Responses API all follow the full request → scan → forward → scan → return pipeline. For streaming responses (SSE), request-side scanning runs as usual but response-side scanning is skipped so chunks pass straight through; request-side prediction results are still logged. AWS Bedrock Runtime converse_stream follows the same request-scan / response-passthrough pattern for AWS EventStream responses. Realtime websocket sessions are a raw passthrough today - neither request-side nor response-side DLP runs on live Realtime events, though session-level logs are still recorded.
Copilot Studio is different from LLM proxy routes: Copilot calls QuilrAI before tool execution, QuilrAI scans the user context and proposed tool input values, and the response is only an allow/block decision.
Observability
Every request is logged with cost, latency, token counts, and guardrail actions. Use the Logs tab to review request history, the LLM Gateway Log Export API to export logs programmatically, and the Red Team Testing tool to validate your guardrail configuration against adversarial prompts.