SDK Mode

Scan content directly from your application code - no LLM proxy required.

Overview

SDK mode exposes a standalone content-checking endpoint (POST /sdk/v1/check) that you can call at any point in your pipeline. Instead of routing LLM traffic through the Quilr gateway, you call this endpoint yourself to scan messages or text for sensitive data and adversarial inputs.

Want to test a key before wiring it into your app? Open the LLM Gateway Playground and switch to the Guardrail check surface.

Common uses:

Check user input before forwarding to an LLM
Scan LLM responses before returning them to users
Scan file uploads, form fields, or other non-LLM content
Integrate with a self-hosted LiteLLM proxy

Authentication

SDK mode requires a dedicated SDK key - regular LLM proxy keys are rejected with 403.

When creating an API key in the dashboard, set the provider to quilr_sdk. Then use it as a Bearer token:

Authorization: Bearer sk-quilr-xxx

Api-Key: sk-quilr-xxx is also accepted.

You can optionally include an X-User-Email header for identity-aware enforcement if that is configured on your key.

Request Format

POST /sdk/v1/check

Two input formats are supported:

Messages (conversation)

Use this to check a full conversation. The type field is optional.

{
  "messages": [
    {"role": "user", "content": "..."},
    {"role": "assistant", "content": "..."}
  ],
  "type": "request",
  "metadata": {}
}

Text (raw string)

Use this to check a single piece of text. The type field is optional.

{
  "text": "some text to check",
  "type": "request"
}

type is "request" or "response".

Response

The endpoint always returns HTTP 200. The response shape depends on which input format you used. Use action for application control flow, and use predictions to inspect what the guardrail found.

Messages response

{
  "status": "safe | redacted | blocked",
  "action": "allow | redact | block",
  "messages": [...],
  "blocked_text": "...",
  "predictions": [
    {
      "id": "...",
      "name": "...",
      "type": "redact",
      "sensitive_entities": ["123-45-6789"],
      "entity_texts_with_subcategories": {
        "123-45-6789": "SOCIAL SECURITY NUMBER"
      }
    }
  ],
  "categories_detected": ["pii", "email", "ssn"],
  "placeholder_masking": {
    "text": "My SSN is quilr-placeholder-01a54629efb95228.",
    "messages": [
      { "role": "user", "content": "My SSN is quilr-placeholder-01a54629efb95228." }
    ],
    "placeholders": [
      {
        "placeholder": "quilr-placeholder-01a54629efb95228",
        "value": "123-45-6789",
        "sub_category": "SOCIAL SECURITY NUMBER",
        "category_id": "data_risk_category_pii",
        "action": "redact",
        "message_index": 0
      }
    ]
  },
  "error": {...}
}

messages - the (possibly redacted) messages array; null if blocked
blocked_text - only present when status is blocked
predictions - rule-level details, including exact sensitive_entities and entity_texts_with_subcategories
error - only present when status is blocked

Text response

{
  "status": "safe | redacted | blocked",
  "action": "allow | redact | block",
  "original_text": "...",
  "processed_text": "...",
  "predictions": [
    {
      "id": "...",
      "name": "...",
      "type": "redact",
      "sensitive_entities": ["555-867-5309"],
      "entity_texts_with_subcategories": {
        "555-867-5309": "PHONE NUMBER"
      }
    }
  ],
  "categories_detected": ["pii", "phone"],
  "placeholder_masking": {
    "text": "Call me at quilr-placeholder-59c0b4a6fc3c3b2c.",
    "messages": null,
    "placeholders": [
      {
        "placeholder": "quilr-placeholder-59c0b4a6fc3c3b2c",
        "value": "555-867-5309",
        "sub_category": "PHONE NUMBER",
        "category_id": "data_risk_category_pii",
        "action": "redact"
      }
    ]
  },
  "error": {...}
}

processed_text - the redacted text; null if blocked
predictions - rule-level details, including exact sensitive_entities and entity_texts_with_subcategories
error - only present when status is blocked

Placeholder masking

Every response also includes placeholder_masking. This is an additive view of the same content where sensitive values are replaced with hash-based placeholders such as quilr-placeholder-358100c210df061d, quilr-placeholder-4c658021550ddeb2, or quilr-placeholder-f52fbd32b2b3b86f.

Use this when your application needs visually distinct, reversible placeholders instead of same-length X redaction. The placeholder token uses the format quilr-placeholder-<hash> and stays stable for the exact matched source value; placeholders[] maps each token back to the original value and detection metadata. Message checks include placeholder_masking.messages; raw text checks set placeholder_masking.messages to null.

Code Examples

Python - `httpx` (async)

A typical pattern: check the user message before sending it to your LLM, then check the LLM response before returning it to the user.

import httpx

QUILR_BASE = "https://guardrails-usa-2.quilr.ai"
QUILR_SDK_KEY = "sk-quilr-xxx"

async def check_messages(messages: list[dict]) -> dict:
    async with httpx.AsyncClient() as client:
        resp = await client.post(
            f"{QUILR_BASE}/sdk/v1/check",
            headers={"Authorization": f"Bearer {QUILR_SDK_KEY}"},
            json={"messages": messages, "type": "request"},
            timeout=5,
        )
        resp.raise_for_status()
        return resp.json()

async def check_text(text: str, type_: str = "response") -> dict:
    async with httpx.AsyncClient() as client:
        resp = await client.post(
            f"{QUILR_BASE}/sdk/v1/check",
            headers={"Authorization": f"Bearer {QUILR_SDK_KEY}"},
            json={"text": text, "type": type_},
            timeout=5,
        )
        resp.raise_for_status()
        return resp.json()

# --- Usage ---

import asyncio
from openai import AsyncOpenAI

openai = AsyncOpenAI(api_key="sk-openai-xxx")

async def safe_chat(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]

    # 1. Check input
    result = await check_messages(messages)
    if result["status"] == "blocked":
        raise ValueError(f"Input blocked: {result['categories_detected']}")
    if result["status"] == "redacted":
        messages = result["messages"]  # use redacted version

    # 2. Call LLM
    response = await openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
    )
    reply = response.choices[0].message.content

    # 3. Check output
    result = await check_text(reply, type_="response")
    if result["status"] == "blocked":
        raise ValueError(f"Response blocked: {result['categories_detected']}")
    if result["status"] == "redacted":
        reply = result["processed_text"]

    return reply

asyncio.run(safe_chat("What is my SSN?"))

Python - `requests` (sync)

import requests

QUILR_BASE = "https://guardrails-usa-2.quilr.ai"
QUILR_SDK_KEY = "sk-quilr-xxx"

def check_text(text: str, type_: str = "response") -> dict:
    resp = requests.post(
        f"{QUILR_BASE}/sdk/v1/check",
        headers={"Authorization": f"Bearer {QUILR_SDK_KEY}"},
        json={"text": text, "type": type_},
        timeout=5,
    )
    resp.raise_for_status()
    return resp.json()

# Check a piece of text before storing or displaying it
result = check_text("My credit card is 4111 1111 1111 1111", type_="request")

match result["status"]:
    case "safe":
        print("No issues found")
    case "redacted":
        print("Cleaned text:", result["processed_text"])
    case "blocked":
        print("Blocked. Detected:", result["categories_detected"])

JavaScript / TypeScript - `fetch`

const QUILR_BASE = "https://guardrails-usa-2.quilr.ai";
const QUILR_SDK_KEY = "sk-quilr-xxx";

async function checkMessages(messages: Array<{ role: string; content: string }>) {
  const res = await fetch(`${QUILR_BASE}/sdk/v1/check`, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: `Bearer ${QUILR_SDK_KEY}`,
    },
    body: JSON.stringify({ messages, type: "request" }),
    signal: AbortSignal.timeout(5000),
  });
  if (!res.ok) throw new Error(`Quilr error: ${res.status}`);
  return res.json();
}

async function checkText(text: string, type: "request" | "response" = "response") {
  const res = await fetch(`${QUILR_BASE}/sdk/v1/check`, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: `Bearer ${QUILR_SDK_KEY}`,
    },
    body: JSON.stringify({ text, type }),
    signal: AbortSignal.timeout(5000),
  });
  if (!res.ok) throw new Error(`Quilr error: ${res.status}`);
  return res.json();
}

// Example: guard a chat endpoint
async function safeChat(userMessage: string): Promise<string> {
  const messages = [{ role: "user", content: userMessage }];

  const inputResult = await checkMessages(messages);
  if (inputResult.status === "blocked") {
    throw new Error(`Blocked: ${inputResult.categories_detected.join(", ")}`);
  }
  const checkedMessages =
    inputResult.status === "redacted" ? inputResult.messages : messages;

  // ... call your LLM with checkedMessages ...
  const llmReply = "...";

  const outputResult = await checkText(llmReply, "response");
  if (outputResult.status === "blocked") {
    throw new Error(`Response blocked: ${outputResult.categories_detected.join(", ")}`);
  }
  return outputResult.status === "redacted" ? outputResult.processed_text : llmReply;
}

cURL

# Check raw text
curl -X POST https://guardrails-usa-2.quilr.ai/sdk/v1/check \
  -H "Authorization: Bearer sk-quilr-xxx" \
  -H "Content-Type: application/json" \
  -d '{"text": "Call me at 555-867-5309", "type": "request"}'

# Check a conversation
curl -X POST https://guardrails-usa-2.quilr.ai/sdk/v1/check \
  -H "Authorization: Bearer sk-quilr-xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the password for admin@acme.com?"}
    ],
    "type": "request"
  }'

LiteLLM Guardrails Integration

If you run a self-hosted LiteLLM proxy, you can plug Quilr guardrails in as a native guardrail plugin. The plugin calls /sdk/v1/check automatically on every request and/or response - no changes needed in your application code.

Installation

pip install quilr-litellm-guardrails

Or copy quilr_litellm_guardrails.py into your project.

Environment variables

Variable	Required	Default	Description
`QUILR_GUARDRAILS_KEY`	Yes	-	Your `quilr_sdk` API key
`QUILR_GUARDRAILS_BASE_URL`	No	`https://guardrails.quilr.ai`	Override with the closest regional endpoint for production or with a self-hosted deployment URL
`QUILR_GUARDRAILS_TIMEOUT`	No	`3`	Seconds before the check times out (request passes on timeout)
`APPLY_QUILR_GUARDRAILS_FOR_MODELS`	No	(all)	Comma-separated list of models to restrict guardrails to
`APPLY_QUILR_GUARDRAILS_FOR_KEY_NAMES`	No	(all)	Comma-separated list of LiteLLM key names to restrict guardrails to

LiteLLM `config.yaml`

guardrails:
  # Check input before the LLM call (adds latency equal to check time)
  - guardrail_name: "quilr-input"
    litellm_params:
      guardrail: quilr_litellm_guardrails.QuilrGuardrail
      mode: "pre_call"

  # Check input in parallel with the LLM call (zero added latency)
  - guardrail_name: "quilr-input-async"
    litellm_params:
      guardrail: quilr_litellm_guardrails.QuilrGuardrail
      mode: "during_call"

  # Check output before returning it to the caller
  - guardrail_name: "quilr-output"
    litellm_params:
      guardrail: quilr_litellm_guardrails.QuilrGuardrail
      mode: "post_call"

You can configure all three, or only the modes you need. during_call is the recommended input mode when latency matters - the guardrail check runs concurrently with the LLM and does not add to total response time unless it detects a problem.

Enabling guardrails per request

Pass the guardrail names in the request body:

curl -X POST http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-litellm-xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello"}],
    "guardrails": ["quilr-input-async", "quilr-output"]
  }'

Behavior summary

Mode	What happens on `blocked`	What happens on `redacted`
`pre_call`	Request rejected before LLM is called	Messages replaced with redacted version before LLM call
`during_call`	LLM response discarded, error returned	Messages updated (LLM call already in flight)
`post_call`	Response rejected, error returned to caller	Response content replaced with redacted version

On timeout or any unexpected error from the Quilr API, the request passes through unchanged.

Overview​

Authentication​

Request Format​

Messages (conversation)​

Text (raw string)​

Response​

Messages response​

Text response​

Placeholder masking​

Code Examples​

Python - httpx (async)​

Python - requests (sync)​

JavaScript / TypeScript - fetch​

cURL​

LiteLLM Guardrails Integration​

Installation​

Environment variables​

LiteLLM config.yaml​

Enabling guardrails per request​

Behavior summary​