Skip to main content

Security Guardrails

Detect and act on sensitive data and adversarial inputs.

Overview

Security guardrails inspect requests and responses passing through the gateway. Each detection category can be independently enabled and assigned an action.

Data Risk Detection

Contextual detection identifies sensitive data categories and applies the configured action.

Supported Categories

  • PII - Personally Identifiable Information
  • PHI - Protected Health Information
  • PCI - Payment Card Industry data
  • Financial data - Financial records and account information

Adversarial Risk Detection

Catches adversarial attack patterns in requests:

  • Prompt injection - Attempts to override system instructions
  • Jailbreak - Attempts to bypass safety controls
  • Social engineering - Manipulation attempts targeting the AI model

Endpoint Coverage

Guardrails run on chat completions, including Bedrock models reached through OpenAI-compatible chat via Converse, embeddings (input), TTS (input), STT (output), Anthropic Messages, AWS Bedrock Runtime boto3 calls, Vertex/Gemini generateContent, the OpenAI Responses API, and Copilot Studio external threat detection. For streaming responses (SSE), request-side scanning runs as normal but response-side scanning is skipped so chunks stream through unmodified.

AWS Bedrock Runtime converse_stream runs request-side DLP and then passes the AWS EventStream response through unchanged. OpenAI Realtime websocket sessions are passthrough today - DLP does not yet run on live Realtime events in either direction. Session-level logs (handshake status, byte counters, usage summary) are still recorded. Use SDK Mode if you need to scan Realtime transcripts out-of-band.

Copilot Studio runs request-side checks on user context and proposed tool input values before tool execution. Redaction-style outcomes become blocks because Copilot Studio cannot accept rewritten tool input.

Configurable Actions

Each detection category supports per-category actions:

ActionBehavior
BlockReject the request entirely
RedactRemove the sensitive data and allow the request
AnonymizeReplace sensitive data with anonymized placeholders
MonitorAllow the request and log the detection for review

Adversarial categories only support block and monitor - redact/anonymize fall back to block since there's no entity to redact.

Per-Category Risk Level

Each data-risk category has a configurable Risk Level that controls how wide a net the category casts. Raise it to catch more sub-categories; lower it to limit detections to only the most unambiguous values.

Risk LevelWhat fires
LowOnly the most unambiguous sub-categories - clearly sensitive values like unique identifiers or structured credentials.
HighEverything Low catches, plus weaker, contextual sub-categories like names, emails, and amounts.

Which sub-categories fire at which Risk Level

CategoryFires at Risk = LowAlso fires at Risk = High
PIISSN, Passport, Driver's License, National IDName, Email, Phone, Date of Birth, Home Address, Employee ID
PHIMedical Appointment, Medical Record Number, PrescriptionMedical Facility, Medical Condition, Medical Treatment
PFIBank Account, Bank Identification Code, PAN Card, Tax InformationFinancial Amount, Invoice, Payment Processor, Transaction ID, Customer ID
PCICredit/Debit Card-
Auth & SecretsAccess Token, API Key, AWS Credentials, PasswordUsername, Username or Alias

Examples (PII)

How the same input is evaluated at different Risk Levels:

Request bodyRisk = LowRisk = High
My name is Praneeth Bedapudi and I live in BengaluruAllowedDetected (NAME, HOME ADDRESS)
Reach me at praneeth@example.com or +91-98765-43210AllowedDetected (EMAIL ADDRESS, PHONE NUMBER)
My passport number is M1234567Detected (PASSPORT NUMBER)Detected (PASSPORT NUMBER)
SSN 123-45-6789Detected (SOCIAL SECURITY NUMBER)Detected (SOCIAL SECURITY NUMBER)

And a mixed example across categories, both at Risk = Low:

Request bodyPIIPFI
Praneeth Bedapudi, PAN ABCDE1234FAllowed (Name only fires at Risk = High)Detected (PAN CARD)

Sub-category Risk Level

Within a category, each sub-category can be pinned to its own Risk Level to override the category-level setting.

Raise the Risk Level to catch more. Lower it to cut noise on categories you only want alerts on when confidence is very high.

Action Scope

Each category can be scoped to run on the request side, the response side, or both. Scope controls which direction a detection runs in - it does not change the configured action.

ScopeRuns onUse when
RequestUser input onlyYou only want to gate what users send (e.g. PII leaving your app)
ResponseModel output onlyYou only want to gate what the model returns (e.g. leaked secrets, unsafe generation)
Both (default)Both directionsFull bidirectional coverage

Adversarial categories are scoped automatically - Response Risks runs on assistant output, all other adversarial categories run on user input.