Skip to main content

Guardian Agent

Guide model behavior with gateway-side policy checks for dependency safety and task adherence.

Overview

Guardian Agent runs inside the LLM Gateway request and response flow. It is not a separate autonomous agent or a new model endpoint. When enabled on an API key, the gateway can add policy instructions before a request reaches the upstream model, retry unsafe dependency output once with corrective guidance, append an advisory when needed, or block off-task requests before they reach the provider.

Use Guardian Agent when you want coding assistants to avoid risky dependency recommendations, or when an agent should stay aligned to the purpose defined by its system prompt.

How It Works

Request Arrives
User message + system prompt
Guardian config loaded
Evaluate Request
Dependency intent?
Task still on purpose?
Apply Policy
Inject guidance
Nudge or block
Call Provider
Forward if allowed
Log Guardian findings
QuilrAI

For dependency output, Guardian Agent adds a response-side review before the final answer is returned:

Model Draft
pip install django==3.2.0
Package specs extracted
Check Dependencies
OSV vulnerabilities
Latest registry versions
Retry Once
Corrective instruction added
Safer dependency output
Return Response
Advisory appended if needed
Findings visible in logs
QuilrAI

Feature Groups

Guardian Agent currently has two feature groups.

Coding Helpers

Coding helpers focus on dependency-related prompts and generated dependency output.

On the request side, Guardian Agent detects dependency intent in user messages, such as requirements.txt, pip install, pyproject.toml, dependency lists, and package version questions. When matched, it injects an upstream system instruction telling the model to avoid vulnerable versions and prefer current stable patched versions, depending on configuration.

On the response side, Guardian Agent scans dependency-like output, including:

  • pip install commands
  • requirements.txt and pyproject.toml
  • package.json
  • Cargo.toml
  • Gemfile
  • go.mod
  • .csproj PackageReference entries
  • pom.xml
  • composer.json

Package extraction is best-effort across PyPI, npm, crates.io, RubyGems, NuGet, Go, Maven, and Packagist. Exact pinned versions can be checked against OSV for known vulnerabilities. Bare package installs resolve the latest registry version first, then check that version. Range specs are skipped in this release.

Latest-version suggestions are supported for exact pins on PyPI, npm, crates.io, RubyGems, NuGet, and Go. Maven and Packagist latest-version checks are skipped in v1.

Task Adherence

Task adherence compares the latest user message against the request system prompt. The system prompt is treated as the agent's purpose. If there is no system prompt, the check is skipped and the request is allowed.

When the latest user message is classified as unrelated to the system prompt, Guardian Agent records a guardian_task_adherence finding and applies the configured action:

ActionBehavior
nudgeAdds an upstream instruction telling the model to redirect back to the configured purpose.
blockBlocks the request before it reaches the upstream model.

Task adherence is request-side only today. Response-side task adherence is not implemented.

Streaming and Retry Behavior

Request-side Guardian Agent checks run before upstream calls for both streaming and non-streaming requests.

For non-streaming responses, dependency findings trigger one retry with Guardian dependency instructions. If the retry still contains dependency advisories, the gateway appends a Guardian note to the final response. Vulnerability advisories suppress latest-version advisories for the same response.

For streaming requests with dependency checks enabled, the gateway first sends a hidden non-streaming upstream request to inspect a full draft response. If no dependency findings are found, the gateway streams that draft back to the client as provider-shaped SSE. If Guardian Agent finds vulnerabilities or update advisories, the gateway adds corrective instructions and sends a second streaming upstream request, then streams the second response to the client.

Other response-side Guardian Agent checks are skipped for normal streaming passthrough.

Endpoint Coverage

Guardian Agent is implemented on these LLM Gateway surfaces:

SurfaceRequest-side checksResponse-side dependency checks
OpenAI-compatible chat completionsYesYes
OpenAI Responses APIYesYes
Anthropic MessagesYesYes
Vertex AI GeminiYesYes

OpenAI-compatible chat includes Bedrock chat models reached through the gateway's Bedrock Converse translation.

Configuration

Guardian Agent is configured per LLM Gateway API key under guardian_agent:

{
"guardian_agent": {
"enabled": true,
"coding_helpers": {
"enabled": true,
"dependency_security_check": true,
"latest_version_suggestions": true
},
"task_adherence": {
"enabled": true,
"sensitivity": "low",
"action": "nudge"
}
}
}
FieldDescription
guardian_agent.enabledEnables or disables Guardian Agent for the key.
coding_helpers.enabledEnables dependency-related request instructions and response review.
coding_helpers.dependency_security_checkChecks exact dependency pins and resolved bare installs for known OSV vulnerabilities.
coding_helpers.latest_version_suggestionsSuggests newer versions for exact pins where registry latest-version checks are supported.
task_adherence.enabledEnables relevance checks between the latest user message and the system prompt.
task_adherence.sensitivityMust be low, medium, or high.
task_adherence.actionMust be nudge or block. If omitted, the action defaults to nudge.

task_adherence.agent_purpose may still appear in older configurations, but the live task-adherence check uses the request system prompt.

On API key create or update, pass guardian_agent as a top-level config field. On key create, it can also be nested inside quilr_api_key_settings. Setting guardian_agent to null on update removes the Guardian Agent configuration from that key.

Logging

Guardian Agent findings are logged with the same prediction shape used by guardrails:

  • type: classify
  • match_type: guardian
  • id: guardian_task_adherence, guardian_coding_dependency_security_review, or guardian_coding_latest_version_review

Guardian categories are also written to metadata.extra_data.guardian_agent.request and metadata.extra_data.guardian_agent.response in exported logs. Nudge and monitor findings appear under actions_and_categories.request.monitored or actions_and_categories.response.monitored. Blocked task-adherence findings appear under actions_and_categories.request.blocked.

If Guardian Agent finds something and nothing was blocked or anonymized, the request outcome becomes monitor_detected. If task adherence is configured with block and the latest user message is classified as unrelated, the request outcome becomes blocked and the upstream model is not called.

Current Limits

  • Dependency extraction is best-effort and can miss unusual manifest shapes.
  • Range specs are not checked for OSV vulnerabilities or latest-version suggestions.
  • Vulnerable dependency output is not hard-blocked today. The current behavior is monitor plus retry, with an appended advisory as fallback.
  • Task adherence checks only the latest user message.
  • Task adherence requires a system prompt. No system prompt means no task-adherence check.
  • Dependency and task-adherence network checks fail open on transient errors.
  • Response-side task adherence is not implemented.
  • Streaming response-side checks are only implemented for dependency security and latest-version coding helpers.