Security Guardrails

Detect and act on sensitive data in MCP tool call inputs and outputs.

Overview

Security guardrails inspect data flowing through MCP tool calls - both the inputs your agent sends and the outputs the tool returns. Each category can be independently enabled with separate request and response actions.

Data Risk Detection

Identifies sensitive data categories and applies the configured action.

Supported Categories

PII - Personally Identifiable Information
PHI - Protected Health Information
PFI - Personal Financial Information
PCI - Payment Card Industry data
Insurance - Insurance-related sensitive data
Auth & Secrets - Authentication credentials and secrets

Adversarial Risk Detection

Catches adversarial attack patterns in tool call data:

Prompt injection - Attempts to override agent instructions
Jailbreak - Attempts to bypass safety controls
Context corruption - Attempts to pollute agent context
Semantic adversarial - Semantically crafted adversarial inputs
Social engineering - Manipulation attempts targeting the AI agent

Configurable Actions

Each detection category supports per-direction actions:

Action	Behavior
Block	Reject the tool call entirely
Redact	Remove the sensitive data and allow the call
Anonymize	Replace sensitive data with anonymized placeholders
Monitor	Allow the call and log the detection for review

Actions are configured independently for:

Request (input) - Data the agent sends to the MCP tool
Response (output) - Data the MCP tool returns to the agent

Overview​

Data Risk Detection​

Supported Categories​

Adversarial Risk Detection​

Configurable Actions​

Overview

Data Risk Detection

Supported Categories

Adversarial Risk Detection

Configurable Actions