Skip to content

Agentic Measurement Probes

Why this matters. Credible agentic AI security needs measurement, not only guidance. This pack turns SecurityRecipes controls into repeatable probes that can be consumed by AI platform reviews, MCP gateways, procurement security, and acquisition diligence.

SecurityRecipes is positioned as the secure context layer for agentic AI. The Agentic Measurement Probe Pack makes that position more concrete: it asks whether a workflow can reconstruct the context, tools, identities, policy decisions, memory, egress, approvals, verifiers, and threat signals behind an agent run.

This is the forward-looking product surface suggested by current industry direction. NIST’s April 2026 agentic measurement probe work focuses on traceability, reconstructing tool usage and evidence, and using judges or verifiers grounded in knowledge bases. OWASP and MCP guidance point to the same need from the security side: agentic systems must prove scope, authorization, context boundaries, telemetry, and failure handling before they operate in high-stakes environments.

Generated artifact

  • Profile: data/assurance/agentic-measurement-probe-profile.json
  • Generator: scripts/generate_agentic_measurement_probe_pack.py
  • Evidence pack: data/evidence/agentic-measurement-probe-pack.json
  • MCP tool: recipes_agentic_measurement_probe_pack

Regenerate and validate the pack:

python3 scripts/generate_agentic_measurement_probe_pack.py
python3 scripts/generate_agentic_measurement_probe_pack.py --check

Probe classes

Probe classWhat it proves
Context integrityRetrieved context is registered, owned, hash-bound, cited, and scanned before it influences an agent.
Tool authorizationMCP namespaces are default-deny, resource-bound, audience-bound, and scoped before tool execution.
Identity delegationAgents act through scoped non-human identities with explicit denies and revocation evidence.
Context egressContext cannot leave tenant, model, telemetry, MCP, or public-corpus boundaries without data-class and destination checks.
Memory boundaryPersistent memory, vector indexes, replay, and prohibited memory are gated before reuse.
Red-team replayWorkflows can replay prompt injection, goal hijack, approval bypass, exfiltration, drift, loop, and evidence-integrity probes.
Run receipt integrityA run can reconstruct context, tools, policy decisions, approvals, verifier output, closure, and identity revocation.
Readiness decisionCurrent evidence supports scale, guarded pilot, manual gate, or block decisions.
Threat radar alignmentProbe coverage maps back to current source-backed agentic and MCP threat signals.

How to use it

AI platform promotion. Call the MCP tool with decision="measurement_ready" to list workflows whose probes pass the minimum score. Treat failed probes as promotion blockers until the source evidence is regenerated or remediated.

MCP connector intake. Filter by class_id="tool_authorization" or class_id="egress_boundary" when approving new remote MCP servers, OAuth-backed connectors, or data-moving tool surfaces.

Quarterly red-team replay. Filter by class_id="red_team_replay" and run the named scenarios against the current model, prompt, tool, context, memory, and policy stack.

Procurement and diligence. Attach the generated pack with the Agentic Assurance Pack, Readiness Scorecard, Agentic System BOM, Run Receipt Pack, and Threat Radar. The probe pack turns those artifacts into a single inspectable measurement story.

MCP examples

List workflows ready for measurement-based promotion:

{
  "decision": "measurement_ready",
  "minimum_score": 90
}

Inspect one workflow:

{
  "workflow_id": "vulnerable-dependency-remediation"
}

Find failed or held probes:

{
  "status": "fail"
}

Inspect egress probes:

{
  "class_id": "egress_boundary"
}

Source anchors

See also