Unified Completions
Use OpenAI Chat Completions clients with provider-native chat models. QuilrAI accepts an OpenAI-style /chat/completions request, translates it to the selected provider, and returns an OpenAI-shaped chat completion response.
Provider notes verified:
- AWS Bedrock translation: May 13, 2026
- Vertex AI translation: June 9, 2026
- Anthropic Messages translation: June 9, 2026
Scope
This page covers translated providers on:
/openai_compatible/v1/chat/completions
This page does not cover:
- OpenAI, Azure OpenAI, Anthropic OpenAI-compatible, DeepSeek, Gemini public API, or custom providers that already expose an OpenAI-compatible upstream API
- Native Vertex AI
/vertex_ai/routes - Native Anthropic Messages
/anthropic_messages/routes - AWS Bedrock Runtime boto3 routes such as
/bedrock-runtime/model/{model_id}/converse - Bedrock embeddings or rerank
- OpenAI Responses API
Use Unified Completions when your application already speaks OpenAI Chat Completions and you want to call selected Bedrock, Vertex AI Gemini, or Anthropic Messages models without switching SDKs.
The translated OpenAI-compatible path is text-only today. Use native Vertex AI, Anthropic Messages, or Bedrock Runtime routes when you need provider-native image, audio, video, document, or other multimodal request shapes.
Request Flow
- Create an LLM Gateway key with provider
bedrock,vertex_ai,anthropic_messages,anthropic_messages_bedrock, oranthropic_messages_azure. - Select the models that the key is allowed to call.
- Point your OpenAI SDK or OpenAI-compatible wrapper at the closest regional endpoint, such as
https://guardrails-usa-2.quilr.ai/openai_compatible/. - Send the provider model name in the OpenAI SDK
modelparameter. - QuilrAI translates the OpenAI-style request to the provider-native chat API and translates the provider response back to OpenAI Chat Completions.
from openai import OpenAI
client = OpenAI(
base_url="https://guardrails-usa-2.quilr.ai/openai_compatible/",
api_key="sk-quilr-xxx",
)
response = client.chat.completions.create(
model="amazon.nova-lite-v1:0",
messages=[{"role": "user", "content": "Summarize this in one sentence."}],
max_tokens=256,
)
print(response.choices[0].message.content)
For Vertex AI Gemini, use the same OpenAI client configuration and pass a selected Gemini model:
response = client.chat.completions.create(
model="gemini-2.5-flash",
messages=[{"role": "user", "content": "Write a concise release note."}],
max_tokens=256,
)
For Anthropic Messages, use the same OpenAI client configuration and pass a selected Claude model:
response = client.chat.completions.create(
model="claude-sonnet-4-5",
messages=[{"role": "user", "content": "Write a concise release note."}],
max_tokens=256,
)
The normal gateway behavior still applies: authentication, provider and model routing, prompt-store substitution, request-side DLP, response-side DLP for non-streaming responses, logging, rate limits, token estimates, Guardian checks, and performance metrics.
Common Contract
These translators use allowlists. Unknown OpenAI parameters are rejected instead of silently dropped.
max_tokens and max_completion_tokens can both be present only when they have the same value. Token limits must be positive integers. stop must be a string or a list of strings.
Anthropic Messages requires max_tokens; if both max_tokens and max_completion_tokens are omitted, QuilrAI sends max_tokens: 1024.
Common rejected OpenAI parameters include logit_bias, logprobs, top_logprobs, reasoning_effort, modalities, audio, store, service_tier, prediction, and provider-specific extra_body.
Message Support
Content is text-only on translated paths:
- String content is supported.
- Content arrays are supported only when every part is text-like.
- Images, audio, files, Bedrock document blocks, Bedrock image blocks, Vertex inline media, Vertex file data, Anthropic image blocks, and Anthropic document blocks are not supported on this OpenAI-compatible path.
Tools and Function Calling
OpenAI tools are supported when each tool is type: "function".
Legacy OpenAI functions and function_call are also supported.
Modern assistant.tool_calls entries should include id. Bedrock and Anthropic Messages reject tool calls without IDs because the later tool result would have no stable identifier. Vertex maps modern assistant.tool_calls and legacy assistant.function_call to functionCall parts, parsing JSON argument strings when possible.
Tool Choice
If a request requires a tool but no tools are present, QuilrAI rejects the request before the upstream call.
Parallel Tool Results
OpenAI-compatible clients often send one role: "tool" message per tool call:
[
{"role": "assistant", "tool_calls": [{"id": "call_a"}, {"id": "call_b"}]},
{"role": "tool", "tool_call_id": "call_a", "content": "result A"},
{"role": "tool", "tool_call_id": "call_b", "content": "result B"}
]
Bedrock, Vertex AI, and Anthropic Messages expect matching tool results for a model turn to stay together in the next user-side content entry. QuilrAI groups consecutive OpenAI role: "tool" or role: "function" messages into one provider-native user message containing multiple tool-result blocks.
This grouping matters for parallel tool calls. Sending each tool result as a separate provider turn can produce upstream validation errors about missing tool results or function responses.
Structured Output
For Bedrock json_schema, QuilrAI maps:
json_schema.strict is validated as a boolean, but it is not separately mapped. Bedrock structured output is schema-constrained through outputConfig.textFormat.
For Vertex AI json_schema, QuilrAI sanitizes the OpenAI schema into the subset accepted by Vertex Gemini. name, description, and strict are type-checked but are not emitted as separate Vertex fields.
Vertex schema normalization:
- Local
$refvalues are inlined. - Cyclic or unresolvable local refs are rejected.
- Nullable single-type unions collapse to
typeplusnullable: true. - Unsupported JSON Schema metadata keys are dropped.
oneOfandanyOfare accepted only for nullable single-type unions.- Other union arrays are rejected.
Dropped Vertex schema metadata keys include $defs, $id, $schema, additionalProperties, default, definitions, patternProperties, title, allOf, and unsupported oneOf or anyOf. Schema property names are preserved; for example, a property named default remains valid.
Anthropic Messages does not map OpenAI JSON mode or structured output on this translated path today. Only {"type": "text"} is accepted.
Streaming Responses
Streaming returns OpenAI-compatible server-sent events.
When stream_options.include_usage is true, Bedrock emits usage chunks from metadata.usage; Vertex emits one final usage chunk using the latest usageMetadata observed in the stream; Anthropic Messages emits one final usage chunk from tracked input and output token counts.
Streaming response-side DLP is not applied on this path. QuilrAI performs request-side scanning, forwards chunks, and accumulates text and tool-call data for logging.
Non-Streaming Responses
Non-streaming provider responses are converted back to OpenAI chat completions:
- Provider text parts are joined into
choices[].message.content. - Provider tool-use or function-call parts become OpenAI
choices[].message.tool_calls. - Tool-call-only responses return
message.content: null. - Provider usage maps to OpenAI
prompt_tokens,completion_tokens, andtotal_tokens.
Bedrock finish reason mapping:
Unknown Bedrock stop reasons pass through unchanged.
Anthropic Messages finish reason mapping:
Vertex AI finish reason mapping:
Unmapped Vertex reasons become stop. Any Vertex response containing function calls returns finish_reason: "tool_calls".
Vertex usage details include:
Anthropic Messages usage maps usage.input_tokens to prompt_tokens, usage.output_tokens to completion_tokens, and their sum to total_tokens.
Provider Setup
Anthropic Messages
Create an LLM Gateway key with provider anthropic_messages, anthropic_messages_bedrock, or anthropic_messages_azure.
Direct Anthropic and Azure Anthropic default anthropic_version to 2023-06-01. Anthropic Messages on Bedrock uses the same Bedrock credential resolver as the native Anthropic Messages Bedrock endpoint.
AWS Bedrock
Create an LLM Gateway key with provider bedrock.
Select one or more Bedrock chat models that support Converse. Send model as the Bedrock model ID or inference profile ID.
AWS Bedrock default region: us-east-1. For assume-role setup, see AWS Bedrock - Assume Role Setup.
Vertex AI
Create an LLM Gateway key with provider vertex_ai.
Model listing for Vertex AI is best effort. Service-account and ADC auth fetch Gemini models from Vertex Model Garden and fall back to a curated Gemini list if fetching fails.
Error Handling
QuilrAI returns OpenAI-shaped error responses for adapter validation failures and preserves upstream provider error messages where possible.
Common Bedrock adapter error codes:
Common Anthropic Messages adapter error codes:
Common Vertex AI adapter error codes:
Upstream Vertex HTTP errors are classified into OpenAI-style error types:
Guardrail Behavior
Request-side DLP scans user text before the upstream call. Non-streaming responses are scanned before they are returned to the client.
Streaming responses are different: request-side DLP still runs, but response-side DLP is skipped so chunks can pass through as they arrive.
Tool messages are carried through without changing tool IDs, function response names, or result ordering. Changing a tool_call_id, dropping a role: "tool" message, or reordering tool results can break provider tool-result validation.
Expected Good Scenarios
These scenarios are covered by the translators:
- Plain text chat
- System, developer, user, and assistant text messages
- Non-streaming text responses
- Streaming text responses
- Provider tool calls translated back to OpenAI
tool_calls - Tool-call deltas in streaming responses
- Consecutive OpenAI tool result messages grouped into one provider-native tool-result user message
- Legacy OpenAI
functions,function_call, androle: "function" - Anthropic top-level
systemconvenience field - Vertex
response_format: json_object - Vertex
response_format: json_schemawith local refs and nullable single-type unions - Bedrock
response_format: json_schemaon models that supportoutputConfig - Vertex reasoning-token and cached-token usage details
Expected Failures
These failures are intentional:
- OpenAI image, audio, or file content
- Mixed multimodal content arrays
- Multiple choices with
n > 1 - Log probabilities
- Token bias
- Audio input or output modes
- Reasoning-effort controls
- Provider-specific
extra_body - Modern Bedrock
assistant.tool_callsentries withoutid - Tool result messages missing
tool_call_id - A user message immediately after assistant tool calls without matching tool results
- Parallel tool results that are not consecutive in the OpenAI message history and therefore cannot be grouped into one provider-native user turn
- Anthropic
response_formatvalues other than{"type": "text"} - Bedrock
response_format: json_object - Bedrock
response_format: json_schemaon models that do not supportoutputConfig - Vertex JSON Schema union arrays other than nullable single-type unions
- Vertex cyclic or unresolvable local JSON Schema refs