Integration Guide
Connect to the QuilrAI gateway in minutes - same SDK, one-line change.
1. Choose Your Endpoint
Region
For production traffic, use the location-specific endpoint closest to your application. The examples below use the US East endpoint; replace it with your nearest regional endpoint if needed. Use https://guardrails.quilr.ai only when you explicitly want global auto-routing.
API Format
Combine a region base URL with the API format path to get your full endpoint. For example:
https://guardrails-usa-2.quilr.ai/openai_compatible/
The OpenAI-compatible path works with OpenAI SDKs and OpenAI-compatible client wrappers. It can call OpenAI / Azure OpenAI and other upstreams that already expose an OpenAI-compatible API. It can also call AWS Bedrock chat models through Bedrock Converse; Bedrock is currently the supported native provider that QuilrAI translates into OpenAI-compatible chat completions.
2. Code Examples
OpenAI-compatible chat - Python
from openai import OpenAI
# Point the client to QuilrAI's gateway
client = OpenAI(
base_url='https://guardrails-usa-2.quilr.ai/openai_compatible/',
api_key='sk-openai-xxx'
api_key='sk-quilr-xxx'
)
# Everything below stays exactly the same
response = client.chat.completions.create(
model='gpt-4o-mini',
messages=[{'role': 'user', 'content': 'Hello!'}]
)
print(response.choices[0].message.content)
# Embeddings work too
embedding = client.embeddings.create(
model='text-embedding-3-small',
input='The quick brown fox'
)
print(embedding.data[0].embedding[:5])
OpenAI-compatible chat - JavaScript
import OpenAI from "openai";
// Point the client to QuilrAI's gateway
const client = new OpenAI({
baseURL: "https://guardrails-usa-2.quilr.ai/openai_compatible/",
apiKey: "sk-openai-xxx",
apiKey: "sk-quilr-xxx",
});
// Everything below stays exactly the same
const response = await client.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);
OpenAI-compatible chat - cURL
# Point the request to QuilrAI's gateway
curl https://api.openai.com/v1/chat/completions \
curl https://guardrails-usa-2.quilr.ai/openai_compatible/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-openai-xxx" \
-H "Authorization: Bearer sk-quilr-xxx" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello!"}]
}'
AWS Bedrock via OpenAI-compatible chat - Python
Create a QuilrAI key with provider bedrock, select the Bedrock models you want to expose, and use the same OpenAI client configuration. The gateway converts the OpenAI-compatible chat request to Bedrock Converse behind the scenes, so no boto3 client is needed.
from openai import OpenAI
client = OpenAI(
base_url='https://guardrails-usa-2.quilr.ai/openai_compatible/',
api_key='sk-quilr-xxx',
)
response = client.chat.completions.create(
model='amazon.nova-lite-v1:0',
messages=[{'role': 'user', 'content': 'Hello from an OpenAI client.'}],
max_tokens=256,
)
print(response.choices[0].message.content)
Use any Bedrock model ID selected on the key that supports Bedrock Converse, including inference profile IDs. The same base URL and key work with OpenAI-compatible wrappers such as LangChain ChatOpenAI; set the wrapper's model to the Bedrock model ID. See OpenAI to Bedrock Translation for supported request parameters, message formats, tools, streaming, and expected failures.
Embeddings - Python
Embeddings use the OpenAI embeddings shape for every supported provider - OpenAI, Azure OpenAI, and AWS Bedrock (Titan / Cohere Embed). The gateway translates to the underlying provider based on how the key is configured, so client code never changes.
from openai import OpenAI
client = OpenAI(
base_url='https://guardrails-usa-2.quilr.ai/openai_compatible/',
api_key='sk-quilr-xxx',
)
# Same call for OpenAI, Azure OpenAI, or AWS Bedrock keys.
# Just use the model name configured on your key
# (e.g. 'text-embedding-3-small', 'amazon.titan-embed-text-v2:0',
# 'cohere.embed-english-v3').
embedding = client.embeddings.create(
model='amazon.titan-embed-text-v2:0',
input='The quick brown fox',
)
print(embedding.data[0].embedding[:5])
Embeddings - cURL
curl https://guardrails-usa-2.quilr.ai/openai_compatible/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-quilr-xxx" \
-d '{
"model": "amazon.titan-embed-text-v2:0",
"input": "The quick brown fox"
}'
No boto3 / invoke_model call on the client side - AWS credentials live on the key in the QuilrAI dashboard, and the gateway performs the Bedrock invoke_model call for you. Titan uses {inputText} and Cohere uses {texts, input_type} upstream; the OpenAI shape is what you send and receive.
Rerank - Python
Rerank uses the Cohere-compatible shape for every supported provider - Cohere, AWS Bedrock (Cohere Rerank 3.5 / Amazon Rerank), Jina, Voyage, and self-hosted (ColBERT / TEI / Infinity). Point the Cohere SDK at the gateway, or just POST JSON.
import cohere
co = cohere.ClientV2(
base_url='https://guardrails-usa-2.quilr.ai/rerank',
api_key='co-xxx',
api_key='sk-quilr-xxx',
)
# Same call for any configured rerank provider.
result = co.rerank(
model='rerank-english-v3.0',
query='What is the capital of France?',
documents=[
'Paris is the capital of France.',
'Berlin is the capital of Germany.',
'The Eiffel Tower is in Paris.',
],
top_n=2,
)
for r in result.results:
print(r.index, r.relevance_score)
Rerank - cURL
curl https://guardrails-usa-2.quilr.ai/rerank/v2/rerank \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-quilr-xxx" \
-d '{
"model": "rerank-english-v3.0",
"query": "What is the capital of France?",
"documents": [
"Paris is the capital of France.",
"Berlin is the capital of Germany.",
"The Eiffel Tower is in Paris."
],
"top_n": 2
}'
The gateway mirrors Cohere's upstream paths, so /rerank/rerank, /rerank/v1/rerank, and /rerank/v2/rerank all work - point your SDK at base_url='https://guardrails-usa-2.quilr.ai/rerank' and it'll append whichever version it uses.
Same pattern as Bedrock embeddings - AWS credentials live on the key, the gateway performs the Bedrock invoke_model call, and your client speaks the Cohere rerank shape. Request-side DLP scans the query and documents fields; the response (scores + indices) is passed through.
Anthropic - Python
import anthropic
# Point the client to QuilrAI's gateway
client = anthropic.Anthropic(
# uses default base URL
base_url='https://guardrails-usa-2.quilr.ai/anthropic_messages/',
api_key='sk-ant-xxx'
api_key='sk-quilr-xxx'
)
# Everything below stays exactly the same
message = client.messages.create(
model='claude-sonnet-4-20250514',
max_tokens=1024,
messages=[{'role': 'user', 'content': 'Hello!'}]
)
print(message.content[0].text)
Anthropic - JavaScript
import Anthropic from "@anthropic-ai/sdk";
// Point the client to QuilrAI's gateway
const client = new Anthropic({
// uses default base URL
baseURL: "https://guardrails-usa-2.quilr.ai/anthropic_messages/",
apiKey: "sk-ant-xxx",
apiKey: "sk-quilr-xxx",
});
// Everything below stays exactly the same
const message = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [{ role: "user", content: "Hello!" }],
});
console.log(message.content[0].text);
Anthropic - cURL
# Point the request to QuilrAI's gateway
curl https://api.anthropic.com/v1/messages \
curl https://guardrails-usa-2.quilr.ai/anthropic_messages/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: sk-ant-xxx" \
-H "x-api-key: sk-quilr-xxx" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello!"}]
}'
Vertex AI - Google GenAI SDK
from google import genai
from google.genai.types import HttpOptions
from google.oauth2 import service_account
from google.auth import credentials as auth_credentials
class APIKeyCredentials(auth_credentials.Credentials):
"""Pass the QuilrAI API key as a Bearer token."""
def __init__(self, api_key):
super().__init__()
self.api_key = api_key
self.token = api_key
def refresh(self, request):
self.token = self.api_key
@property
def valid(self):
return True
credentials = service_account.Credentials.from_service_account_file(
'service.json',
scopes=['https://www.googleapis.com/auth/cloud-platform']
)
credentials = APIKeyCredentials('sk-quilr-xxx')
client = genai.Client(
vertexai=True,
project='your-gcp-project',
location='us-central1',
credentials=credentials,
# uses default Vertex AI endpoint
http_options=HttpOptions(base_url='https://guardrails-usa-2.quilr.ai/vertex_ai'),
)
# Everything below stays exactly the same
response = client.models.generate_content(
model='gemini-2.5-flash',
contents='Hello!'
)
print(response.text)
Vertex AI - LangChain
from google.oauth2 import service_account
from google.oauth2 import credentials as ga_credentials
from langchain_google_genai import ChatGoogleGenerativeAI
class _NoopCredentials(ga_credentials.Credentials):
"""Inject the QuilrAI API key as a Bearer token."""
def __init__(self, api_key):
super().__init__(token=api_key)
def refresh(self, request):
pass
@property
def valid(self):
return True
credentials = service_account.Credentials.from_service_account_file(
'service.json',
scopes=['https://www.googleapis.com/auth/cloud-platform']
)
credentials = _NoopCredentials('sk-quilr-xxx')
llm = ChatGoogleGenerativeAI(
model='gemini-2.5-flash',
credentials=credentials,
base_url='https://guardrails-usa-2.quilr.ai/vertex_ai',
project='your-gcp-project',
location='us-central1',
vertexai=True,
)
# Everything below stays exactly the same
response = llm.invoke('Hello!')
print(response.content)
Replace sk-quilr-xxx with the API key you created in the dashboard. The model parameter uses the same model names as your provider. For Vertex AI, the project and location should match the values configured when creating the key.
AWS Bedrock Runtime - boto3
Use this mode when your app already calls Bedrock Runtime through boto3. Create a QuilrAI key with provider bedrock, then point the Bedrock Runtime client at QuilrAI.
import boto3
from botocore.config import Config
QUILR_KEY = "sk-quilr-xxx"
bedrock = boto3.client(
"bedrock-runtime",
region_name="us-east-1",
endpoint_url="https://guardrails-usa-2.quilr.ai/bedrock-runtime",
aws_access_key_id="AKIA...",
aws_access_key_id=QUILR_KEY,
aws_secret_access_key="aws-secret",
aws_secret_access_key=QUILR_KEY,
config=Config(read_timeout=300),
)
response = bedrock.converse(
modelId="amazon.nova-lite-v1:0",
messages=[
{
"role": "user",
"content": [{"text": "Hello!"}],
}
],
inferenceConfig={"maxTokens": 256},
)
print(response["output"]["message"]["content"][0]["text"])
converse, converse_stream, and invoke_model are supported. invoke_model_with_response_stream is not enabled yet and returns ValidationException. See AWS Bedrock - boto3 Runtime for operation coverage, guardrail behavior, and troubleshooting.
The Responses and Realtime endpoints are only served for keys whose primary provider is openai_responses / openai_responses_azure / openai_realtime / openai_realtime_azure, or that have one of those added as an additional provider. A plain "OpenAI" or "Azure OpenAI" chat-completions key cannot hit /openai_responses/ or /openai_realtime/ by just swapping the URL - add the Responses or Realtime provider to the key first. See Provider Support for the full matrix.
OpenAI Responses - Python
from openai import OpenAI
# Point the client to QuilrAI's gateway
client = OpenAI(
base_url='https://guardrails-usa-2.quilr.ai/openai_responses/v1',
api_key='sk-openai-xxx'
api_key='sk-quilr-xxx'
)
# Everything below stays exactly the same
response = client.responses.create(
model='gpt-5',
input=[{'role': 'user', 'content': 'Hello!'}],
instructions='You are a helpful assistant.'
)
print(response.output_text)
OpenAI Responses - JavaScript
import OpenAI from "openai";
// Point the client to QuilrAI's gateway
const client = new OpenAI({
baseURL: "https://guardrails-usa-2.quilr.ai/openai_responses/v1",
apiKey: "sk-openai-xxx",
apiKey: "sk-quilr-xxx",
});
// Everything below stays exactly the same
const response = await client.responses.create({
model: "gpt-5",
input: [{ role: "user", content: "Hello!" }],
instructions: "You are a helpful assistant.",
});
console.log(response.output_text);
OpenAI Responses - cURL
# Point the request to QuilrAI's gateway
curl https://api.openai.com/v1/responses \
curl https://guardrails-usa-2.quilr.ai/openai_responses/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-openai-xxx" \
-H "Authorization: Bearer sk-quilr-xxx" \
-d '{
"model": "gpt-5",
"input": [{"role": "user", "content": "Hello!"}]
}'
For Azure OpenAI Responses, the deployment name goes in model and Quilr resolves it against the azure_endpoint configured on the key. The Azure-style deployment alias /openai_responses/openai/deployments/{deployment}/responses is also supported.
OpenAI Realtime - Python
import asyncio
from openai import AsyncOpenAI
async def main():
client = AsyncOpenAI(
base_url='https://guardrails-usa-2.quilr.ai/openai/v1',
api_key='sk-openai-xxx',
api_key='sk-quilr-xxx',
)
# Everything below stays exactly the same
async with client.realtime.connect(model='gpt-realtime') as conn:
await conn.session.update(session={'modalities': ['text']})
await conn.conversation.item.create(item={
'type': 'message',
'role': 'user',
'content': [{'type': 'input_text', 'text': 'Hello!'}],
})
await conn.response.create()
async for event in conn:
if event.type == 'response.output_text.delta':
print(event.delta, end='', flush=True)
elif event.type == 'response.done':
break
asyncio.run(main())
OpenAI Realtime - JavaScript
import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket";
const rt = new OpenAIRealtimeWebSocket({
baseURL: "wss://guardrails-usa-2.quilr.ai/openai/v1",
apiKey: "sk-openai-xxx",
apiKey: "sk-quilr-xxx",
model: "gpt-realtime",
});
rt.on("response.output_text.delta", (e) => process.stdout.write(e.delta));
rt.send({
type: "conversation.item.create",
item: {
type: "message",
role: "user",
content: [{ type: "input_text", text: "Hello!" }],
},
});
rt.send({ type: "response.create" });
Realtime sessions are a raw websocket passthrough. Voice I/O (PCM16 input_audio_buffer.append / response.output_audio.delta) works end-to-end. Live-event DLP is not yet applied to Realtime sessions - see Provider Support.
Microsoft Copilot Studio
Create a QuilrAI key with provider copilot_studio, then use the full endpoint base in Power Platform admin center:
https://guardrails-usa-2.quilr.ai/copilot_studio/sk-quilr-xxx
Power Platform appends /validate during setup and /analyze-tool-execution at runtime. QuilrAI scans Copilot user context and proposed tool inputs, then returns Copilot's expected allow/block response.
See Copilot Studio for Power Platform configuration steps.
3. Optional Headers
4. Using Routing Groups
If you've configured a Routing Group, pass the group name as the model parameter. The gateway automatically load-balances and fails over across providers in that group.
response = client.chat.completions.create(
model='Group1', # your routing group name
messages=[{'role': 'user', 'content': 'Hello!'}]
)
See Request Routing for full details on setting up groups.
5. Selecting a Provider on Multi-Provider Keys
A single QuilrAI key can have one primary provider plus any number of additional providers. When more than one compatible provider is configured, you can pick which provider handles a specific request. If you omit a selector, QuilrAI uses the first compatible provider on the key.
Match by either the provider type (e.g. bedrock, openai_responses_azure, anthropic_messages_bedrock) or the label you set on the additional provider in the dashboard.
# Responses: pick a specific additional provider
response = client.responses.create(
model='gpt-5',
input=[{'role': 'user', 'content': 'Hello!'}],
extra_body={'provider_label': 'azure-westus'},
)
# Realtime: select via query string (headers also work)
async with client.realtime.connect(
model='gpt-realtime',
extra_query={'provider_label': 'azure-westus'},
) as conn:
...