Skip to main content

Integration Guide

Connect to the QuilrAI gateway in minutes - same SDK, one-line change.

1. Choose Your Endpoint

Region

RegionBase URL
Nearest (auto)https://guardrails.quilr.ai
USA (US Central West)https://guardrails-usa-1.quilr.ai
USA (US East)https://guardrails-usa-2.quilr.ai
Indiahttps://guardrails-india-1.quilr.ai

For production traffic, use the location-specific endpoint closest to your application. The examples below use the US East endpoint; replace it with your nearest regional endpoint if needed. Use https://guardrails.quilr.ai only when you explicitly want global auto-routing.

API Format

FormatPathAuth Header
OpenAI-compatible/openai_compatible/Authorization: Bearer sk-quilr-xxx
Anthropic/anthropic_messages/x-api-key: sk-quilr-xxx
AWS Bedrock Runtime (boto3)/bedrock-runtime/AWS SigV4 using sk-quilr-xxx
Vertex AI/vertex_ai/Authorization: Bearer sk-quilr-xxx
OpenAI Responses/openai_responses/Authorization: Bearer sk-quilr-xxx
OpenAI Realtime (wss)/openai_realtime/Authorization: Bearer sk-quilr-xxx
Copilot Studio/copilot_studio/{sk-quilr-xxx}QuilrAI key in endpoint path

Combine a region base URL with the API format path to get your full endpoint. For example:

https://guardrails-usa-2.quilr.ai/openai_compatible/

The OpenAI-compatible path works with OpenAI SDKs and OpenAI-compatible client wrappers. It can call OpenAI / Azure OpenAI and other upstreams that already expose an OpenAI-compatible API. It can also call AWS Bedrock chat models through Bedrock Converse; Bedrock is currently the supported native provider that QuilrAI translates into OpenAI-compatible chat completions.

2. Code Examples

OpenAI-compatible chat - Python

from openai import OpenAI

# Point the client to QuilrAI's gateway
client = OpenAI(
base_url='https://guardrails-usa-2.quilr.ai/openai_compatible/',
api_key='sk-openai-xxx'
api_key='sk-quilr-xxx'
)

# Everything below stays exactly the same
response = client.chat.completions.create(
model='gpt-4o-mini',
messages=[{'role': 'user', 'content': 'Hello!'}]
)
print(response.choices[0].message.content)

# Embeddings work too
embedding = client.embeddings.create(
model='text-embedding-3-small',
input='The quick brown fox'
)
print(embedding.data[0].embedding[:5])

OpenAI-compatible chat - JavaScript

import OpenAI from "openai";

// Point the client to QuilrAI's gateway
const client = new OpenAI({
baseURL: "https://guardrails-usa-2.quilr.ai/openai_compatible/",
apiKey: "sk-openai-xxx",
apiKey: "sk-quilr-xxx",
});

// Everything below stays exactly the same
const response = await client.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);

OpenAI-compatible chat - cURL

# Point the request to QuilrAI's gateway
curl https://api.openai.com/v1/chat/completions \
curl https://guardrails-usa-2.quilr.ai/openai_compatible/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-openai-xxx" \
-H "Authorization: Bearer sk-quilr-xxx" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello!"}]
}'

AWS Bedrock via OpenAI-compatible chat - Python

Create a QuilrAI key with provider bedrock, select the Bedrock models you want to expose, and use the same OpenAI client configuration. The gateway converts the OpenAI-compatible chat request to Bedrock Converse behind the scenes, so no boto3 client is needed.

from openai import OpenAI

client = OpenAI(
base_url='https://guardrails-usa-2.quilr.ai/openai_compatible/',
api_key='sk-quilr-xxx',
)

response = client.chat.completions.create(
model='amazon.nova-lite-v1:0',
messages=[{'role': 'user', 'content': 'Hello from an OpenAI client.'}],
max_tokens=256,
)
print(response.choices[0].message.content)

Use any Bedrock model ID selected on the key that supports Bedrock Converse, including inference profile IDs. The same base URL and key work with OpenAI-compatible wrappers such as LangChain ChatOpenAI; set the wrapper's model to the Bedrock model ID. See OpenAI to Bedrock Translation for supported request parameters, message formats, tools, streaming, and expected failures.

Embeddings - Python

Embeddings use the OpenAI embeddings shape for every supported provider - OpenAI, Azure OpenAI, and AWS Bedrock (Titan / Cohere Embed). The gateway translates to the underlying provider based on how the key is configured, so client code never changes.

from openai import OpenAI

client = OpenAI(
base_url='https://guardrails-usa-2.quilr.ai/openai_compatible/',
api_key='sk-quilr-xxx',
)

# Same call for OpenAI, Azure OpenAI, or AWS Bedrock keys.
# Just use the model name configured on your key
# (e.g. 'text-embedding-3-small', 'amazon.titan-embed-text-v2:0',
# 'cohere.embed-english-v3').
embedding = client.embeddings.create(
model='amazon.titan-embed-text-v2:0',
input='The quick brown fox',
)
print(embedding.data[0].embedding[:5])

Embeddings - cURL

curl https://guardrails-usa-2.quilr.ai/openai_compatible/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-quilr-xxx" \
-d '{
"model": "amazon.titan-embed-text-v2:0",
"input": "The quick brown fox"
}'
AWS Bedrock embeddings

No boto3 / invoke_model call on the client side - AWS credentials live on the key in the QuilrAI dashboard, and the gateway performs the Bedrock invoke_model call for you. Titan uses {inputText} and Cohere uses {texts, input_type} upstream; the OpenAI shape is what you send and receive.

Rerank - Python

Rerank uses the Cohere-compatible shape for every supported provider - Cohere, AWS Bedrock (Cohere Rerank 3.5 / Amazon Rerank), Jina, Voyage, and self-hosted (ColBERT / TEI / Infinity). Point the Cohere SDK at the gateway, or just POST JSON.

import cohere

co = cohere.ClientV2(
base_url='https://guardrails-usa-2.quilr.ai/rerank',
api_key='co-xxx',
api_key='sk-quilr-xxx',
)

# Same call for any configured rerank provider.
result = co.rerank(
model='rerank-english-v3.0',
query='What is the capital of France?',
documents=[
'Paris is the capital of France.',
'Berlin is the capital of Germany.',
'The Eiffel Tower is in Paris.',
],
top_n=2,
)
for r in result.results:
print(r.index, r.relevance_score)

Rerank - cURL

curl https://guardrails-usa-2.quilr.ai/rerank/v2/rerank \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-quilr-xxx" \
-d '{
"model": "rerank-english-v3.0",
"query": "What is the capital of France?",
"documents": [
"Paris is the capital of France.",
"Berlin is the capital of Germany.",
"The Eiffel Tower is in Paris."
],
"top_n": 2
}'

The gateway mirrors Cohere's upstream paths, so /rerank/rerank, /rerank/v1/rerank, and /rerank/v2/rerank all work - point your SDK at base_url='https://guardrails-usa-2.quilr.ai/rerank' and it'll append whichever version it uses.

AWS Bedrock rerank

Same pattern as Bedrock embeddings - AWS credentials live on the key, the gateway performs the Bedrock invoke_model call, and your client speaks the Cohere rerank shape. Request-side DLP scans the query and documents fields; the response (scores + indices) is passed through.

Anthropic - Python

import anthropic

# Point the client to QuilrAI's gateway
client = anthropic.Anthropic(
# uses default base URL
base_url='https://guardrails-usa-2.quilr.ai/anthropic_messages/',
api_key='sk-ant-xxx'
api_key='sk-quilr-xxx'
)

# Everything below stays exactly the same
message = client.messages.create(
model='claude-sonnet-4-20250514',
max_tokens=1024,
messages=[{'role': 'user', 'content': 'Hello!'}]
)
print(message.content[0].text)

Anthropic - JavaScript

import Anthropic from "@anthropic-ai/sdk";

// Point the client to QuilrAI's gateway
const client = new Anthropic({
// uses default base URL
baseURL: "https://guardrails-usa-2.quilr.ai/anthropic_messages/",
apiKey: "sk-ant-xxx",
apiKey: "sk-quilr-xxx",
});

// Everything below stays exactly the same
const message = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [{ role: "user", content: "Hello!" }],
});
console.log(message.content[0].text);

Anthropic - cURL

# Point the request to QuilrAI's gateway
curl https://api.anthropic.com/v1/messages \
curl https://guardrails-usa-2.quilr.ai/anthropic_messages/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: sk-ant-xxx" \
-H "x-api-key: sk-quilr-xxx" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello!"}]
}'

Vertex AI - Google GenAI SDK

from google import genai
from google.genai.types import HttpOptions
from google.oauth2 import service_account
from google.auth import credentials as auth_credentials


class APIKeyCredentials(auth_credentials.Credentials):
"""Pass the QuilrAI API key as a Bearer token."""

def __init__(self, api_key):
super().__init__()
self.api_key = api_key
self.token = api_key

def refresh(self, request):
self.token = self.api_key

@property
def valid(self):
return True


credentials = service_account.Credentials.from_service_account_file(
'service.json',
scopes=['https://www.googleapis.com/auth/cloud-platform']
)
credentials = APIKeyCredentials('sk-quilr-xxx')

client = genai.Client(
vertexai=True,
project='your-gcp-project',
location='us-central1',
credentials=credentials,
# uses default Vertex AI endpoint
http_options=HttpOptions(base_url='https://guardrails-usa-2.quilr.ai/vertex_ai'),
)

# Everything below stays exactly the same
response = client.models.generate_content(
model='gemini-2.5-flash',
contents='Hello!'
)
print(response.text)

Vertex AI - LangChain

from google.oauth2 import service_account
from google.oauth2 import credentials as ga_credentials
from langchain_google_genai import ChatGoogleGenerativeAI


class _NoopCredentials(ga_credentials.Credentials):
"""Inject the QuilrAI API key as a Bearer token."""

def __init__(self, api_key):
super().__init__(token=api_key)

def refresh(self, request):
pass

@property
def valid(self):
return True


credentials = service_account.Credentials.from_service_account_file(
'service.json',
scopes=['https://www.googleapis.com/auth/cloud-platform']
)
credentials = _NoopCredentials('sk-quilr-xxx')

llm = ChatGoogleGenerativeAI(
model='gemini-2.5-flash',
credentials=credentials,
base_url='https://guardrails-usa-2.quilr.ai/vertex_ai',
project='your-gcp-project',
location='us-central1',
vertexai=True,
)

# Everything below stays exactly the same
response = llm.invoke('Hello!')
print(response.content)

Replace sk-quilr-xxx with the API key you created in the dashboard. The model parameter uses the same model names as your provider. For Vertex AI, the project and location should match the values configured when creating the key.

AWS Bedrock Runtime - boto3

Use this mode when your app already calls Bedrock Runtime through boto3. Create a QuilrAI key with provider bedrock, then point the Bedrock Runtime client at QuilrAI.

import boto3
from botocore.config import Config

QUILR_KEY = "sk-quilr-xxx"

bedrock = boto3.client(
"bedrock-runtime",
region_name="us-east-1",
endpoint_url="https://guardrails-usa-2.quilr.ai/bedrock-runtime",
aws_access_key_id="AKIA...",
aws_access_key_id=QUILR_KEY,
aws_secret_access_key="aws-secret",
aws_secret_access_key=QUILR_KEY,
config=Config(read_timeout=300),
)

response = bedrock.converse(
modelId="amazon.nova-lite-v1:0",
messages=[
{
"role": "user",
"content": [{"text": "Hello!"}],
}
],
inferenceConfig={"maxTokens": 256},
)

print(response["output"]["message"]["content"][0]["text"])

converse, converse_stream, and invoke_model are supported. invoke_model_with_response_stream is not enabled yet and returns ValidationException. See AWS Bedrock - boto3 Runtime for operation coverage, guardrail behavior, and troubleshooting.

Provider configuration required

The Responses and Realtime endpoints are only served for keys whose primary provider is openai_responses / openai_responses_azure / openai_realtime / openai_realtime_azure, or that have one of those added as an additional provider. A plain "OpenAI" or "Azure OpenAI" chat-completions key cannot hit /openai_responses/ or /openai_realtime/ by just swapping the URL - add the Responses or Realtime provider to the key first. See Provider Support for the full matrix.

OpenAI Responses - Python

from openai import OpenAI

# Point the client to QuilrAI's gateway
client = OpenAI(
base_url='https://guardrails-usa-2.quilr.ai/openai_responses/v1',
api_key='sk-openai-xxx'
api_key='sk-quilr-xxx'
)

# Everything below stays exactly the same
response = client.responses.create(
model='gpt-5',
input=[{'role': 'user', 'content': 'Hello!'}],
instructions='You are a helpful assistant.'
)
print(response.output_text)

OpenAI Responses - JavaScript

import OpenAI from "openai";

// Point the client to QuilrAI's gateway
const client = new OpenAI({
baseURL: "https://guardrails-usa-2.quilr.ai/openai_responses/v1",
apiKey: "sk-openai-xxx",
apiKey: "sk-quilr-xxx",
});

// Everything below stays exactly the same
const response = await client.responses.create({
model: "gpt-5",
input: [{ role: "user", content: "Hello!" }],
instructions: "You are a helpful assistant.",
});
console.log(response.output_text);

OpenAI Responses - cURL

# Point the request to QuilrAI's gateway
curl https://api.openai.com/v1/responses \
curl https://guardrails-usa-2.quilr.ai/openai_responses/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-openai-xxx" \
-H "Authorization: Bearer sk-quilr-xxx" \
-d '{
"model": "gpt-5",
"input": [{"role": "user", "content": "Hello!"}]
}'

For Azure OpenAI Responses, the deployment name goes in model and Quilr resolves it against the azure_endpoint configured on the key. The Azure-style deployment alias /openai_responses/openai/deployments/{deployment}/responses is also supported.

OpenAI Realtime - Python

import asyncio
from openai import AsyncOpenAI


async def main():
client = AsyncOpenAI(
base_url='https://guardrails-usa-2.quilr.ai/openai/v1',
api_key='sk-openai-xxx',
api_key='sk-quilr-xxx',
)

# Everything below stays exactly the same
async with client.realtime.connect(model='gpt-realtime') as conn:
await conn.session.update(session={'modalities': ['text']})
await conn.conversation.item.create(item={
'type': 'message',
'role': 'user',
'content': [{'type': 'input_text', 'text': 'Hello!'}],
})
await conn.response.create()
async for event in conn:
if event.type == 'response.output_text.delta':
print(event.delta, end='', flush=True)
elif event.type == 'response.done':
break


asyncio.run(main())

OpenAI Realtime - JavaScript

import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket";

const rt = new OpenAIRealtimeWebSocket({
baseURL: "wss://guardrails-usa-2.quilr.ai/openai/v1",
apiKey: "sk-openai-xxx",
apiKey: "sk-quilr-xxx",
model: "gpt-realtime",
});

rt.on("response.output_text.delta", (e) => process.stdout.write(e.delta));
rt.send({
type: "conversation.item.create",
item: {
type: "message",
role: "user",
content: [{ type: "input_text", text: "Hello!" }],
},
});
rt.send({ type: "response.create" });

Realtime sessions are a raw websocket passthrough. Voice I/O (PCM16 input_audio_buffer.append / response.output_audio.delta) works end-to-end. Live-event DLP is not yet applied to Realtime sessions - see Provider Support.

Microsoft Copilot Studio

Create a QuilrAI key with provider copilot_studio, then use the full endpoint base in Power Platform admin center:

https://guardrails-usa-2.quilr.ai/copilot_studio/sk-quilr-xxx

Power Platform appends /validate during setup and /analyze-tool-execution at runtime. QuilrAI scans Copilot user context and proposed tool inputs, then returns Copilot's expected allow/block response.

See Copilot Studio for Power Platform configuration steps.

3. Optional Headers

HeaderPurpose
X-User-EmailIdentifies the end user behind the request. See Identity Aware.
X-Conversation-IdGroups related requests into a single conversation in logs and analytics. See Conversation Grouping.
X-Provider-Name / X-Provider-LabelSelects a specific provider on multi-provider keys (see section 5 below).
X-Prompt-VariablesSupplies {{variable}} values for stored prompts. See Prompt Store.

4. Using Routing Groups

If you've configured a Routing Group, pass the group name as the model parameter. The gateway automatically load-balances and fails over across providers in that group.

response = client.chat.completions.create(
model='Group1', # your routing group name
messages=[{'role': 'user', 'content': 'Hello!'}]
)

See Request Routing for full details on setting up groups.

5. Selecting a Provider on Multi-Provider Keys

A single QuilrAI key can have one primary provider plus any number of additional providers. When more than one compatible provider is configured, you can pick which provider handles a specific request. If you omit a selector, QuilrAI uses the first compatible provider on the key.

EndpointBody fieldHeaderQuery param
Chat Completions / Anthropic Messages / Vertex / Embeddings / Rerankprovider or provider_labelX-Provider-Name / X-Provider-Label-
Responsesprovider or provider_labelX-Provider-Name / X-Provider-Label-
Realtime (websocket)-X-Provider-Name / X-Provider-Labelprovider or provider_label

Match by either the provider type (e.g. bedrock, openai_responses_azure, anthropic_messages_bedrock) or the label you set on the additional provider in the dashboard.

# Responses: pick a specific additional provider
response = client.responses.create(
model='gpt-5',
input=[{'role': 'user', 'content': 'Hello!'}],
extra_body={'provider_label': 'azure-westus'},
)
# Realtime: select via query string (headers also work)
async with client.realtime.connect(
model='gpt-realtime',
extra_query={'provider_label': 'azure-westus'},
) as conn:
...