Skip to main content

Unified Completions

Use OpenAI Chat Completions clients with provider-native chat models. QuilrAI accepts an OpenAI-style /chat/completions request, translates it to the selected provider, and returns an OpenAI-shaped chat completion response.

Provider notes verified:

  • AWS Bedrock translation: May 13, 2026
  • Vertex AI translation: June 9, 2026
  • Anthropic Messages translation: June 9, 2026

Scope

This page covers translated providers on:

/openai_compatible/v1/chat/completions
Provider keyUpstream callModel valueStreamingContent support
bedrockBedrock Converse / ConverseStreamSelected Bedrock model ID or inference profile IDYesText only
vertex_aiVertex AI Gemini generateContent / streamGenerateContentSelected Gemini model nameYesText only
anthropic_messagesAnthropic Messages messages.create / streamingSelected Claude model nameYesText only
anthropic_messages_bedrockAnthropic Messages on BedrockSelected Bedrock Claude model IDYesText only
anthropic_messages_azureAzure Anthropic MessagesSelected Claude model nameYesText only

This page does not cover:

  • OpenAI, Azure OpenAI, Anthropic OpenAI-compatible, DeepSeek, Gemini public API, or custom providers that already expose an OpenAI-compatible upstream API
  • Native Vertex AI /vertex_ai/ routes
  • Native Anthropic Messages /anthropic_messages/ routes
  • AWS Bedrock Runtime boto3 routes such as /bedrock-runtime/model/{model_id}/converse
  • Bedrock embeddings or rerank
  • OpenAI Responses API

Use Unified Completions when your application already speaks OpenAI Chat Completions and you want to call selected Bedrock, Vertex AI Gemini, or Anthropic Messages models without switching SDKs.

Native multimodal routes

The translated OpenAI-compatible path is text-only today. Use native Vertex AI, Anthropic Messages, or Bedrock Runtime routes when you need provider-native image, audio, video, document, or other multimodal request shapes.

Request Flow

  1. Create an LLM Gateway key with provider bedrock, vertex_ai, anthropic_messages, anthropic_messages_bedrock, or anthropic_messages_azure.
  2. Select the models that the key is allowed to call.
  3. Point your OpenAI SDK or OpenAI-compatible wrapper at the closest regional endpoint, such as https://guardrails-usa-2.quilr.ai/openai_compatible/.
  4. Send the provider model name in the OpenAI SDK model parameter.
  5. QuilrAI translates the OpenAI-style request to the provider-native chat API and translates the provider response back to OpenAI Chat Completions.
from openai import OpenAI

client = OpenAI(
base_url="https://guardrails-usa-2.quilr.ai/openai_compatible/",
api_key="sk-quilr-xxx",
)

response = client.chat.completions.create(
model="amazon.nova-lite-v1:0",
messages=[{"role": "user", "content": "Summarize this in one sentence."}],
max_tokens=256,
)

print(response.choices[0].message.content)

For Vertex AI Gemini, use the same OpenAI client configuration and pass a selected Gemini model:

response = client.chat.completions.create(
model="gemini-2.5-flash",
messages=[{"role": "user", "content": "Write a concise release note."}],
max_tokens=256,
)

For Anthropic Messages, use the same OpenAI client configuration and pass a selected Claude model:

response = client.chat.completions.create(
model="claude-sonnet-4-5",
messages=[{"role": "user", "content": "Write a concise release note."}],
max_tokens=256,
)

The normal gateway behavior still applies: authentication, provider and model routing, prompt-store substitution, request-side DLP, response-side DLP for non-streaming responses, logging, rate limits, token estimates, Guardian checks, and performance metrics.

Common Contract

These translators use allowlists. Unknown OpenAI parameters are rejected instead of silently dropped.

OpenAI parameterBedrock supportVertex AI supportAnthropic Messages support
messagesSupportedSupportedSupported
modelSelected Bedrock model ID or inference profile IDSelected Gemini model nameSelected Claude model name or Bedrock Claude model ID
streamBedrock ConverseStreamVertex streamGenerateContentAnthropic Messages streaming
max_tokensinferenceConfig.maxTokensgenerationConfig.maxOutputTokensmax_tokens
max_completion_tokensinferenceConfig.maxTokensgenerationConfig.maxOutputTokensmax_tokens
temperatureinferenceConfig.temperaturegenerationConfig.temperaturetemperature
top_pinferenceConfig.topPgenerationConfig.topPtop_p
stopinferenceConfig.stopSequencesgenerationConfig.stopSequencesstop_sequences
toolsFunction toolsFunction declarationsAnthropic tools
functionsLegacy function toolsLegacy function declarationsLegacy function tools
tool_choiceauto, none, required, and function-specific choicesauto, none, required, and function-specific choicesauto, none, required, and function-specific choices
function_callSupportedSupportedSupported
response_formattext, json_schematext, json_object, json_schematext only
stream_optionsinclude_usageinclude_usageinclude_usage
parallel_tool_callsAccepted as a boolean; false is not enforcedAccepted as a boolean; false is not enforcedAccepted as a boolean; false is not enforced
nMust be absent or 1Must be absent or 1Must be absent or 1
metadataAccepted, not sent upstreamAccepted, not sent upstreamSent as Anthropic metadata
userAccepted, not sent upstreamAccepted, not sent upstreamSent as metadata.user_id when not already set
systemRejectedRejectedAccepted as a top-level convenience field and merged with system/developer messages
frequency_penaltyRejectedgenerationConfig.frequencyPenaltyRejected
presence_penaltyRejectedgenerationConfig.presencePenaltyRejected
seedRejectedgenerationConfig.seedRejected

max_tokens and max_completion_tokens can both be present only when they have the same value. Token limits must be positive integers. stop must be a string or a list of strings.

Anthropic Messages requires max_tokens; if both max_tokens and max_completion_tokens are omitted, QuilrAI sends max_tokens: 1024.

Common rejected OpenAI parameters include logit_bias, logprobs, top_logprobs, reasoning_effort, modalities, audio, store, service_tier, prediction, and provider-specific extra_body.

Message Support

OpenAI roleBedrock translationVertex AI translationAnthropic Messages translation
systemTop-level Bedrock system blockTop-level systemInstruction.partsTop-level Anthropic system
developerTop-level Bedrock system blockTop-level systemInstruction.partsTop-level Anthropic system
userBedrock message with role userVertex content with role userAnthropic message with role user
assistantBedrock message with role assistantVertex content with role modelAnthropic message with role assistant
toolBedrock toolResult block when tools are activeVertex functionResponse partAnthropic tool_result block
functionBedrock toolResult block for legacy function historyVertex functionResponse partAnthropic tool_result block

Content is text-only on translated paths:

  • String content is supported.
  • Content arrays are supported only when every part is text-like.
  • Images, audio, files, Bedrock document blocks, Bedrock image blocks, Vertex inline media, Vertex file data, Anthropic image blocks, and Anthropic document blocks are not supported on this OpenAI-compatible path.

Tools and Function Calling

OpenAI tools are supported when each tool is type: "function".

OpenAI fieldBedrock fieldVertex AI fieldAnthropic Messages field
tools[].function.nametoolSpec.namefunctionDeclarations[].nametools[].name
tools[].function.descriptiontoolSpec.descriptionfunctionDeclarations[].descriptiontools[].description
tools[].function.parameterstoolSpec.inputSchema.jsonfunctionDeclarations[].parameterstools[].input_schema
tools[].function.stricttoolSpec.strictValidated through schema handling; not emitted as a Vertex fieldNot translated

Legacy OpenAI functions and function_call are also supported.

Modern assistant.tool_calls entries should include id. Bedrock and Anthropic Messages reject tool calls without IDs because the later tool result would have no stable identifier. Vertex maps modern assistant.tool_calls and legacy assistant.function_call to functionCall parts, parsing JSON argument strings when possible.

Tool Choice

OpenAI tool_choiceBedrock behaviorVertex AI behaviorAnthropic Messages behavior
absentNo explicit Bedrock toolChoiceNo Vertex toolConfigNo Anthropic tool_choice
autoNo explicit Bedrock toolChoiceNo Vertex toolConfig{"type": "auto"}
noneNo Bedrock toolConfig; tool history is serialized as textfunctionCallingConfig.mode = "NONE"No tools are sent
requiredBedrock toolChoice.anyfunctionCallingConfig.mode = "ANY"{"type": "any"}
function-specific choiceBedrock toolChoice.tool.namefunctionCallingConfig.mode = "ANY" with allowedFunctionNames{"type": "tool", "name": ...}

If a request requires a tool but no tools are present, QuilrAI rejects the request before the upstream call.

Parallel Tool Results

OpenAI-compatible clients often send one role: "tool" message per tool call:

[
{"role": "assistant", "tool_calls": [{"id": "call_a"}, {"id": "call_b"}]},
{"role": "tool", "tool_call_id": "call_a", "content": "result A"},
{"role": "tool", "tool_call_id": "call_b", "content": "result B"}
]

Bedrock, Vertex AI, and Anthropic Messages expect matching tool results for a model turn to stay together in the next user-side content entry. QuilrAI groups consecutive OpenAI role: "tool" or role: "function" messages into one provider-native user message containing multiple tool-result blocks.

This grouping matters for parallel tool calls. Sending each tool result as a separate provider turn can produce upstream validation errors about missing tool results or function responses.

Structured Output

response_formatBedrock supportVertex AI supportAnthropic Messages support
{"type": "text"}SupportedSupportedSupported
{"type": "json_object"}RejectedMaps to generationConfig.responseMimeType = "application/json"Rejected
{"type": "json_schema", "json_schema": {...}}Supported on Bedrock models that accept outputConfigMaps to generationConfig.responseMimeType = "application/json" plus generationConfig.responseSchemaRejected

For Bedrock json_schema, QuilrAI maps:

OpenAI fieldBedrock field
json_schema.nameoutputConfig.textFormat.jsonSchema.name
json_schema.descriptionoutputConfig.textFormat.jsonSchema.description
json_schema.schemaoutputConfig.textFormat.jsonSchema.schema

json_schema.strict is validated as a boolean, but it is not separately mapped. Bedrock structured output is schema-constrained through outputConfig.textFormat.

For Vertex AI json_schema, QuilrAI sanitizes the OpenAI schema into the subset accepted by Vertex Gemini. name, description, and strict are type-checked but are not emitted as separate Vertex fields.

Vertex schema normalization:

  • Local $ref values are inlined.
  • Cyclic or unresolvable local refs are rejected.
  • Nullable single-type unions collapse to type plus nullable: true.
  • Unsupported JSON Schema metadata keys are dropped.
  • oneOf and anyOf are accepted only for nullable single-type unions.
  • Other union arrays are rejected.

Dropped Vertex schema metadata keys include $defs, $id, $schema, additionalProperties, default, definitions, patternProperties, title, allOf, and unsupported oneOf or anyOf. Schema property names are preserved; for example, a property named default remains valid.

Anthropic Messages does not map OpenAI JSON mode or structured output on this translated path today. Only {"type": "text"} is accepted.

Streaming Responses

Streaming returns OpenAI-compatible server-sent events.

ProviderUpstream streamOpenAI stream behavior
BedrockConverseStreamBedrock message, text, tool-use, stop, and usage events are converted to OpenAI deltas
Vertex AIstreamGenerateContent with alt=sseVertex candidate text, function calls, finish reasons, and usage metadata are converted to OpenAI deltas
Anthropic MessagesAnthropic Messages streamAnthropic message, text, tool-use, stop, and usage events are converted to OpenAI deltas

When stream_options.include_usage is true, Bedrock emits usage chunks from metadata.usage; Vertex emits one final usage chunk using the latest usageMetadata observed in the stream; Anthropic Messages emits one final usage chunk from tracked input and output token counts.

Streaming response-side DLP is not applied on this path. QuilrAI performs request-side scanning, forwards chunks, and accumulates text and tool-call data for logging.

Non-Streaming Responses

Non-streaming provider responses are converted back to OpenAI chat completions:

  • Provider text parts are joined into choices[].message.content.
  • Provider tool-use or function-call parts become OpenAI choices[].message.tool_calls.
  • Tool-call-only responses return message.content: null.
  • Provider usage maps to OpenAI prompt_tokens, completion_tokens, and total_tokens.

Bedrock finish reason mapping:

Bedrock stop reasonOpenAI finish reason
end_turnstop
stop_sequencestop
max_tokenslength
tool_usetool_calls
content_filteredcontent_filter
guardrail_intervenedcontent_filter

Unknown Bedrock stop reasons pass through unchanged.

Anthropic Messages finish reason mapping:

Anthropic stop reasonOpenAI finish reason
end_turnstop
stop_sequencestop
max_tokenslength
tool_usetool_calls
pause_turnstop
refusalcontent_filter

Vertex AI finish reason mapping:

Vertex finish reasonOpenAI finish reason
STOPstop
MAX_TOKENSlength
SAFETYcontent_filter
RECITATIONcontent_filter
BLOCKLISTcontent_filter
PROHIBITED_CONTENTcontent_filter
SPIIcontent_filter
IMAGE_SAFETYcontent_filter
FINISH_REASON_UNSPECIFIEDnull

Unmapped Vertex reasons become stop. Any Vertex response containing function calls returns finish_reason: "tool_calls".

Vertex usage details include:

Vertex usage fieldOpenAI usage field
promptTokenCountprompt_tokens
candidatesTokenCount plus thoughtsTokenCountcompletion_tokens
totalTokenCounttotal_tokens
thoughtsTokenCountcompletion_tokens_details.reasoning_tokens
cachedContentTokenCountprompt_tokens_details.cached_tokens

Anthropic Messages usage maps usage.input_tokens to prompt_tokens, usage.output_tokens to completion_tokens, and their sum to total_tokens.

Provider Setup

Anthropic Messages

Create an LLM Gateway key with provider anthropic_messages, anthropic_messages_bedrock, or anthropic_messages_azure.

ProviderAuth modeRequired fieldsOptional fields
Anthropic MessagesAPI Keyapi_keyanthropic_version
Anthropic Messages on BedrockStatic AWS keysaws_access_key, aws_secret_keyaws_region, aws_session_token
Anthropic Messages on BedrockAssume roleaws_role_arn, aws_external_idaws_region, aws_role_session_name, aws_session_duration_seconds
Azure Anthropic MessagesAPI Keyapi_key, base_urlanthropic_version

Direct Anthropic and Azure Anthropic default anthropic_version to 2023-06-01. Anthropic Messages on Bedrock uses the same Bedrock credential resolver as the native Anthropic Messages Bedrock endpoint.

AWS Bedrock

Create an LLM Gateway key with provider bedrock.

Auth modeRequired fieldsOptional fields
Static AWS keysaws_access_key, aws_secret_keyaws_region, aws_session_token
Assume roleaws_role_arn, aws_external_idaws_region, aws_role_session_name, aws_session_duration_seconds

Select one or more Bedrock chat models that support Converse. Send model as the Bedrock model ID or inference profile ID.

AWS Bedrock default region: us-east-1. For assume-role setup, see AWS Bedrock - Assume Role Setup.

Vertex AI

Create an LLM Gateway key with provider vertex_ai.

Auth modeRequired fieldsOptional fieldsNotes
Expressapi_key-Uses x-goog-api-key; no project ID required
API Keyapi_key, gcp_project_idgcp_regionDefault region: us-central1
Service Accountservice_account_jsongcp_project_id, gcp_regionProject ID can be derived from the JSON
ADCgcp_project_idgcp_regionUses Application Default Credentials

Model listing for Vertex AI is best effort. Service-account and ADC auth fetch Gemini models from Vertex Model Garden and fall back to a curated Gemini list if fetching fails.

Error Handling

QuilrAI returns OpenAI-shaped error responses for adapter validation failures and preserves upstream provider error messages where possible.

Common Bedrock adapter error codes:

Error codeMeaning
unsupported_bedrock_openai_parameterThe request included a parameter that is not translated for Bedrock.
invalid_bedrock_openai_parameterA supported parameter had an invalid value or type.
unsupported_bedrock_openai_contentThe request included unsupported content such as image, audio, or file parts.
invalid_bedrock_openai_messagesMessage order or tool-result history was invalid.
unsupported_bedrock_openai_roleThe request included an unsupported role.
invalid_bedrock_openai_toolsTool definitions or tool-call history were malformed.
unsupported_bedrock_openai_toolsThe request used a tool shape that cannot be translated.
bedrock_credentials_errorBedrock credentials could not be loaded or used.
bedrock_converse_errorBedrock Converse returned an error.
bedrock_converse_stream_errorBedrock ConverseStream returned an error.

Common Anthropic Messages adapter error codes:

Error codeMeaning
unsupported_anthropic_openai_parameterThe request included a parameter that is not translated for Anthropic Messages.
invalid_anthropic_openai_parameterA supported parameter had an invalid value or type.
unsupported_anthropic_openai_contentThe request included unsupported content such as image, audio, file, image block, or document block parts.
invalid_anthropic_openai_messagesMessage order or tool-result history was invalid.
unsupported_anthropic_openai_roleThe request included an unsupported role.
invalid_anthropic_openai_toolsTool definitions or tool-call history were malformed.
unsupported_anthropic_openai_toolsThe request used a tool shape that cannot be translated.
missing_provider_keyThe Anthropic provider API key is missing.
missing_azure_anthropic_base_urlThe Azure Anthropic provider is missing base_url.
anthropic_messages_bedrock_credentials_errorBedrock credentials could not be loaded or used for Anthropic Messages on Bedrock.
anthropic_messages_bedrock_package_missingThe Anthropic Bedrock package required for the upstream call is missing.
anthropic_messages_errorAnthropic Messages returned an error.
anthropic_messages_stream_errorAnthropic Messages streaming returned an error.

Common Vertex AI adapter error codes:

Error codeMeaning
unsupported_vertex_openai_parameterThe request included a parameter that is not translated for Vertex AI.
invalid_vertex_openai_parameterA supported parameter had an invalid value or type.
unsupported_vertex_openai_contentThe request included unsupported content such as image, audio, file, or inline media parts.
invalid_vertex_openai_messagesMessage order or tool-result history was invalid.
unsupported_vertex_openai_roleThe request included an unsupported role.
invalid_vertex_openai_toolsTool definitions or tool-call history were malformed.
unsupported_vertex_openai_toolsThe request used a tool shape that cannot be translated.
vertex_credentials_errorVertex credentials could not be loaded or used.
vertex_generate_content_errorVertex generateContent returned an error.
vertex_generate_content_timeoutVertex generateContent timed out.
vertex_generate_content_parse_errorA Vertex generateContent response could not be parsed.
vertex_stream_generate_content_errorVertex streamGenerateContent returned an error.
vertex_stream_timeoutThe Vertex stream timed out.
vertex_stream_parse_errorA Vertex stream event could not be parsed.

Upstream Vertex HTTP errors are classified into OpenAI-style error types:

Upstream statusOpenAI-style error type
401authentication_error
429rate_limit_error
4xxinvalid_request_error
5xxupstream_error

Guardrail Behavior

Request-side DLP scans user text before the upstream call. Non-streaming responses are scanned before they are returned to the client.

Streaming responses are different: request-side DLP still runs, but response-side DLP is skipped so chunks can pass through as they arrive.

Tool messages are carried through without changing tool IDs, function response names, or result ordering. Changing a tool_call_id, dropping a role: "tool" message, or reordering tool results can break provider tool-result validation.

Expected Good Scenarios

These scenarios are covered by the translators:

  • Plain text chat
  • System, developer, user, and assistant text messages
  • Non-streaming text responses
  • Streaming text responses
  • Provider tool calls translated back to OpenAI tool_calls
  • Tool-call deltas in streaming responses
  • Consecutive OpenAI tool result messages grouped into one provider-native tool-result user message
  • Legacy OpenAI functions, function_call, and role: "function"
  • Anthropic top-level system convenience field
  • Vertex response_format: json_object
  • Vertex response_format: json_schema with local refs and nullable single-type unions
  • Bedrock response_format: json_schema on models that support outputConfig
  • Vertex reasoning-token and cached-token usage details

Expected Failures

These failures are intentional:

  • OpenAI image, audio, or file content
  • Mixed multimodal content arrays
  • Multiple choices with n > 1
  • Log probabilities
  • Token bias
  • Audio input or output modes
  • Reasoning-effort controls
  • Provider-specific extra_body
  • Modern Bedrock assistant.tool_calls entries without id
  • Tool result messages missing tool_call_id
  • A user message immediately after assistant tool calls without matching tool results
  • Parallel tool results that are not consecutive in the OpenAI message history and therefore cannot be grouped into one provider-native user turn
  • Anthropic response_format values other than {"type": "text"}
  • Bedrock response_format: json_object
  • Bedrock response_format: json_schema on models that do not support outputConfig
  • Vertex JSON Schema union arrays other than nullable single-type unions
  • Vertex cyclic or unresolvable local JSON Schema refs