Skip to content

Context Poisoning Guard

Why this page exists. A secure context layer cannot only hash context. It has to inspect context for instruction-like payloads before that context is returned to an agent through MCP.

The product bet

SecurityRecipes is positioned as the secure context layer for agentic AI. The strongest enterprise version of that idea is not a prompt library. It is a controlled context supply chain:

  • registered source roots,
  • owners and trust tiers,
  • retrieval decisions,
  • source hashes,
  • poisoning controls,
  • and deterministic inspection before context reaches an agent.

The Context Poisoning Guard adds that inspection layer. It scans every registered context root from the Secure Context Registry and produces a generated evidence pack that says whether a source passes, contains only documented adversarial examples, should hold for review, or should be blocked until fixed.

What was added

  • Source profile: data/assurance/context-poisoning-guard-profile.json
  • Generator: scripts/generate_context_poisoning_guard_pack.py
  • Evidence pack: data/evidence/context-poisoning-guard-pack.json
  • MCP tool: recipes_context_poisoning_guard_pack

Regenerate and validate the pack:

python3 scripts/generate_context_poisoning_guard_pack.py
python3 scripts/generate_context_poisoning_guard_pack.py --check

What it scans

RuleSeverityWhy it matters
Direct instruction overrideCriticalDetects text that asks an agent to ignore or override higher-priority instructions.
Secret exfiltration requestCriticalDetects transfer language near secrets, tokens, credentials, private keys, or environment dumps.
Approval bypass requestHighDetects requests to skip, bypass, remove, or disable review, approval, policy, CI, or guardrails.
Hidden HTML instructionHighDetects hidden HTML/comment patterns that may evade human review but remain visible to models.
External callback instructionHighDetects send/post/upload/callback language near external URLs.
Encoded payloadMediumDetects long base64-like strings that may hide instructions or data.
Zero-width controlMediumDetects zero-width and bidirectional controls that can hide or reorder text.

The guard is intentionally conservative. It does not pretend regexes can solve prompt injection. It creates evidence and routing:

  • pass when no markers are detected.
  • allow_with_adversarial_examples when markers appear only in documented red-team, threat-model, or defensive examples.
  • hold_for_context_review when normal guidance contains high-risk markers.
  • block_until_removed when critical actionable findings appear outside approved examples.

Why this is enterprise-grade

This feature makes AI easier for buyers because it turns a hard question into a simple artifact:

Can this context be returned to an agent?

An MCP server, AI platform intake workflow, or procurement reviewer can ask the guard pack for source-level decisions and findings instead of reading every page manually. The answer carries source ID, path, line, rule ID, severity, disposition, and source hash.

The generated pack supports:

  • prompt-library publication review,
  • MCP server intake,
  • quarterly secure-context recertification,
  • red-team replay planning,
  • acquisition diligence,
  • and future hosted context monitoring.

MCP examples

Get the portfolio-level summary:

{}

Get all sources held for context review:

{
  "decision": "hold_for_context_review"
}

Get actionable critical findings for one source:

{
  "source_id": "prompt-library-recipes",
  "severity": "critical",
  "actionable_only": true
}

Get all direct instruction override matches:

{
  "rule_id": "direct-instruction-override"
}

Industry alignment

The guard follows current agentic AI and MCP security guidance:

See also