Skip to content

Agentic Catastrophic Risk Annex

What this is. The annex is the high-impact autonomy layer above the normal readiness scorecard. It answers the board and buyer question: “Which agent actions must be held, denied, or killed before a rare but severe failure becomes irreversible?”

SecurityRecipes already has workflow policy, MCP authorization, non-human identity, context trust, handoff boundaries, egress policy, run receipts, readiness scoring, and red-team drills. The Agentic Catastrophic Risk Annex joins those controls into one generated packet for severe-risk decisions.

This is intentionally practical. It does not claim to solve long-horizon AI safety. It focuses on production-testable agentic controls that enterprise buyers can inspect:

  • Which action classes are high impact?
  • Which scenarios require human approval and risk acceptance?
  • Which missing evidence creates a hold?
  • Which runtime signals kill the session?
  • Which source packs prove the decision?
  • Which MCP tools expose the evidence to an agent host or review portal?

Generated artifact

  • Source model: data/assurance/agentic-catastrophic-risk-annex.json
  • Generator: scripts/generate_agentic_catastrophic_risk_annex.py
  • Evidence pack: data/evidence/agentic-catastrophic-risk-annex.json
  • Runtime evaluator: scripts/evaluate_agentic_catastrophic_risk_decision.py
  • MCP tools: recipes_agentic_catastrophic_risk_annex, recipes_evaluate_agentic_catastrophic_risk_decision

Regenerate and validate the pack:

python3 scripts/generate_agentic_catastrophic_risk_annex.py
python3 scripts/generate_agentic_catastrophic_risk_annex.py --check

Evaluate a held high-impact deployment decision:

python3 scripts/evaluate_agentic_catastrophic_risk_decision.py \
  --workflow-id base-image-remediation \
  --action-class production_deployment \
  --run-id run-123 \
  --identity-id sr-agent::base-image-remediation::codex \
  --policy-pack-hash policy-hash \
  --authorization-decision allow_authorized_mcp_request \
  --flag affects_prod=true \
  --expect-decision hold_for_catastrophic_risk_review

Evaluate an approved high-impact action:

python3 scripts/evaluate_agentic_catastrophic_risk_decision.py \
  --workflow-id base-image-remediation \
  --action-class production_deployment \
  --run-id run-123 \
  --identity-id sr-agent::base-image-remediation::codex \
  --policy-pack-hash policy-hash \
  --authorization-decision allow_authorized_mcp_request \
  --risk-acceptance-id risk-accept-123 \
  --receipt-id receipt-123 \
  --approval-id approval-123 \
  --flag affects_prod=true \
  --expect-decision allow_reviewed_high_impact_action

Why this matters now

The 2026 market has moved from “secure the prompt” to “secure the autonomous action layer.” NIST’s AI Agent Standards Initiative centers agent authentication, identity infrastructure, interoperable protocols, and security evaluations. OWASP’s Agentic Top 10 frames autonomous systems around tool use, identity abuse, context poisoning, insecure inter-agent communication, cascading failure, and rogue behavior. The MCP authorization specification now makes resource indicators, audience validation, protected resource metadata, least-privilege scopes, and token handling explicit. CSAI’s April 2026 work adds the strongest signal: catastrophic-risk assurance for loss of oversight, uncontrolled behavior, and large-scale irreversible consequences that can be tested in production.

SecurityRecipes should own the operational middle: not abstract safety claims, but generated evidence and runtime gates that tell an agent host when to allow, hold, deny, or kill.

Severe scenarios

ScenarioDefault decisionWhat it proves
Loss of human oversighthold_for_catastrophic_risk_reviewHigh-impact action stops when approval, identity, policy, receipt, or readiness evidence is missing.
Uncontrolled system behaviorkill_session_on_catastrophic_signalTool loops, deny-after-deny behavior, and uncontrolled action cascades terminate the session.
Credential and token cascadedeny_unbounded_autonomyAgents cannot pass tokens, request broader audiences, or silently inherit user authority.
Cross-agent cascading failurehold_for_catastrophic_risk_reviewHandoffs cannot move hidden prompts, memory, raw traces, credentials, or unstated authority.
Supply-chain autonomy blast radiushold_for_catastrophic_risk_reviewFleet-impacting dependency, image, cache, release, or generated-code changes require risk evidence.
Private-context exfiltrationkill_session_on_catastrophic_signalSecrets, unredacted PII, private memory, and trust evidence cannot leave approved boundaries.
Irreversible financial or critical actiondeny_unbounded_autonomyFunds movement, critical-infrastructure control, identity administration, and mass deletion stay denied unless separately accepted.

Runtime decision contract

Use the evaluator before a separate gateway or orchestrator allows a high-impact action. The evaluator returns:

DecisionMeaning
allow_bounded_agent_actionThe request does not match high-impact action classes or severe flags and has baseline identity evidence.
allow_reviewed_high_impact_actionA high-impact action has approval, risk acceptance, identity, policy, authorization, and receipt evidence.
hold_for_catastrophic_risk_reviewRequired runtime evidence is missing or the action needs explicit risk review.
deny_unbounded_autonomyHigh-impact autonomy lacks risk acceptance or tries irreversible authority without approval.
kill_session_on_catastrophic_signalRuntime behavior indicates a severe safety or security violation.

High-impact action classes include production deployments, production writes, identity administration, secret access, schema migrations, mass deletion, public releases, critical-infrastructure control, funds movement, and connector scope escalation.

Buyer diligence questions

Buyer viewQuestion
Board and executive riskCan the organization say yes to agentic AI without losing control of irreversible or large-scale consequences?
AI platform securityCan high-impact tool calls be stopped before they cross MCP, identity, data, memory, or inter-agent boundaries?
Acquisition diligenceDoes SecurityRecipes have a credible future enterprise assurance surface beyond open prompts and docs?

Product strategy

This annex pushes SecurityRecipes toward the “Secure Context Layer for Agentic AI” thesis:

LayerValue
Open foundationSevere-risk scenarios, default decisions, source packs, runtime evaluator, and MCP tools are public and forkable.
Production MCP serverHosted high-impact action inventory, approval receipt validation, customer-specific risk acceptance, and runtime kill policy.
Enterprise expansionBoard reporting, insurer evidence, procurement exports, red-team replay, and customer-specific severe-risk test suites.
Strategic acquisition fitFrontier labs, coding-agent platforms, cloud providers, and security vendors need a credible action-governance layer for enterprise agents.

MCP examples

Get the annex summary:

{}

Get one severe scenario:

{
  "scenario_id": "private-context-exfiltration"
}

Evaluate a high-impact action:

{
  "workflow_id": "base-image-remediation",
  "action_class": "production_deployment",
  "run_id": "run-123",
  "identity_id": "sr-agent::base-image-remediation::codex",
  "policy_pack_hash": "policy-hash",
  "authorization_decision": "allow_authorized_mcp_request",
  "affects_prod": true
}

Source anchors

See also