Security Guardrails
Detect and act on sensitive data in MCP tool call inputs and outputs.
Overview
Security guardrails inspect data flowing through MCP tool calls - both the inputs your agent sends and the outputs the tool returns. Each category can be independently enabled with separate request and response actions.
Data Risk Detection
Identifies sensitive data categories and applies the configured action.
Supported Categories
- PII - Personally Identifiable Information
- PHI - Protected Health Information
- PFI - Personal Financial Information
- PCI - Payment Card Industry data
- Insurance - Insurance-related sensitive data
- Auth & Secrets - Authentication credentials and secrets
Adversarial Risk Detection
Catches adversarial attack patterns in tool call data:
- Prompt injection - Attempts to override agent instructions
- Jailbreak - Attempts to bypass safety controls
- Context corruption - Attempts to pollute agent context
- Semantic adversarial - Semantically crafted adversarial inputs
- Social engineering - Manipulation attempts targeting the AI agent
Configurable Actions
Each detection category supports per-direction actions:
Actions are configured independently for:
- Request (input) - Data the agent sends to the MCP tool
- Response (output) - Data the MCP tool returns to the agent