Agentic Measurement Probes

Why this matters. Credible agentic AI security needs measurement, not only guidance. This pack turns SecurityRecipes controls into repeatable probes that can be consumed by AI platform reviews, MCP gateways, procurement security, and acquisition diligence.

SecurityRecipes is positioned as the secure context layer for agentic AI. The Agentic Measurement Probe Pack makes that position more concrete: it asks whether a workflow can reconstruct the context, tools, identities, policy decisions, memory, egress, approvals, verifiers, and threat signals behind an agent run.

This is the forward-looking product surface suggested by current industry direction. NIST’s April 2026 agentic measurement probe work focuses on traceability, reconstructing tool usage and evidence, and using judges or verifiers grounded in knowledge bases. OWASP and MCP guidance point to the same need from the security side: agentic systems must prove scope, authorization, context boundaries, telemetry, and failure handling before they operate in high-stakes environments.

Generated artifact

Profile: data/assurance/agentic-measurement-probe-profile.json
Generator: scripts/generate_agentic_measurement_probe_pack.py
Evidence pack: data/evidence/agentic-measurement-probe-pack.json
MCP tool: recipes_agentic_measurement_probe_pack

Regenerate and validate the pack:

python3 scripts/generate_agentic_measurement_probe_pack.py
python3 scripts/generate_agentic_measurement_probe_pack.py --check

Probe classes

Probe class	What it proves
Context integrity	Retrieved context is registered, owned, hash-bound, cited, and scanned before it influences an agent.
Tool authorization	MCP namespaces are default-deny, resource-bound, audience-bound, and scoped before tool execution.
Identity delegation	Agents act through scoped non-human identities with explicit denies and revocation evidence.
Context egress	Context cannot leave tenant, model, telemetry, MCP, or public-corpus boundaries without data-class and destination checks.
Memory boundary	Persistent memory, vector indexes, replay, and prohibited memory are gated before reuse.
Red-team replay	Workflows can replay prompt injection, goal hijack, approval bypass, exfiltration, drift, loop, and evidence-integrity probes.
Run receipt integrity	A run can reconstruct context, tools, policy decisions, approvals, verifier output, closure, and identity revocation.
Readiness decision	Current evidence supports scale, guarded pilot, manual gate, or block decisions.
Threat radar alignment	Probe coverage maps back to current source-backed agentic and MCP threat signals.

How to use it

AI platform promotion. Call the MCP tool with decision="measurement_ready" to list workflows whose probes pass the minimum score. Treat failed probes as promotion blockers until the source evidence is regenerated or remediated.

MCP connector intake. Filter by class_id="tool_authorization" or class_id="egress_boundary" when approving new remote MCP servers, OAuth-backed connectors, or data-moving tool surfaces.

Quarterly red-team replay. Filter by class_id="red_team_replay" and run the named scenarios against the current model, prompt, tool, context, memory, and policy stack.

Procurement and diligence. Attach the generated pack with the Agentic Assurance Pack, Readiness Scorecard, Agentic System BOM, Run Receipt Pack, and Threat Radar. The probe pack turns those artifacts into a single inspectable measurement story.

MCP examples

List workflows ready for measurement-based promotion:

{
  "decision": "measurement_ready",
  "minimum_score": 90
}

Inspect one workflow:

{
  "workflow_id": "vulnerable-dependency-remediation"
}

Find failed or held probes:

{
  "status": "fail"
}

Inspect egress probes:

{
  "class_id": "egress_boundary"
}

Agentic Measurement Probes

Generated artifact

Probe classes

How to use it

MCP examples

Source anchors

See also