Skip to main content

Security Guardrails

Detect and act on sensitive data in MCP tool call inputs and outputs.

Overview

Security guardrails inspect data flowing through MCP tool calls - both the inputs your agent sends and the outputs the tool returns. Each category can be independently enabled with separate request and response actions.

Data Risk Detection

Identifies sensitive data categories and applies the configured action.

Supported Categories

  • PII - Personally Identifiable Information
  • PHI - Protected Health Information
  • PFI - Personal Financial Information
  • PCI - Payment Card Industry data
  • Insurance - Insurance-related sensitive data
  • Auth & Secrets - Authentication credentials and secrets

Exact Data Matching (EDM)

Pattern matching for specific data formats:

  • SSN
  • Aadhaar
  • PAN
  • Passport numbers
  • Bank account numbers
  • Credit card numbers
  • Email addresses
  • Phone numbers
  • 20+ more patterns

Adversarial Risk Detection

Catches adversarial attack patterns in tool call data:

  • Prompt injection - Attempts to override agent instructions
  • Jailbreak - Attempts to bypass safety controls
  • Context corruption - Attempts to pollute agent context
  • Semantic adversarial - Semantically crafted adversarial inputs
  • Social engineering - Manipulation attempts targeting the AI agent

Configurable Actions

Each detection category supports per-direction actions:

ActionBehavior
BlockReject the tool call entirely
RedactRemove the sensitive data and allow the call
AnonymizeReplace sensitive data with anonymized placeholders
MonitorAllow the call and log the detection for review

Actions are configured independently for:

  • Request (input) - Data the agent sends to the MCP tool
  • Response (output) - Data the MCP tool returns to the agent