Secure Context Evals
SecurityRecipes is positioned as The Secure Context Layer for Agentic AI. The Secure Context Eval Pack turns that positioning into a CI-ready product surface: a buyer can inspect scenario results, source hashes, expected runtime decisions, citation requirements, and agent-to-agent handoff limits through the repo and MCP server.
This is the next high-value layer after the Secure Context Trust Pack and Secure Context Attestation. The trust pack proves what context is registered. Attestation proves the context package can be certified. The eval pack proves the runtime behavior a buyer actually cares about: will the system return the right context, hold when signatures are missing, terminate on prohibited data classes, preserve citations, and keep remote-agent handoffs metadata-only?
What was added
data/assurance/secure-context-eval-scenarios.json- the source scenario profile.scripts/generate_secure_context_eval_pack.py- deterministic pack generator with--checkand--update-if-stalesupport.scripts/evaluate_secure_context_eval_case.py- runtime evaluator for observed answers, citations, decisions, and handoff payloads.data/evidence/secure-context-eval-pack.json- generated evidence pack for CI, MCP, diligence, and trust-center review.- MCP tools:
recipes_secure_context_eval_packandrecipes_evaluate_secure_context_eval_case.
Regenerate and validate:
python3 scripts/generate_secure_context_eval_pack.py
python3 scripts/generate_secure_context_eval_pack.py --checkEvaluate one runtime answer:
python3 scripts/evaluate_secure_context_eval_case.py \
--scenario-id ctx-eval-prohibited-data-kill \
--answer-text "kill_session" \
--expect-decision eval_readyEval scenario classes
| Scenario class | What it proves |
|---|---|
| Retrieval correctness | The requested workflow, source, path, and retrieval mode produce the expected allow, hold, deny, or kill decision. |
| Source attestation | Production MCP and trust-center use hold unless the context package has the required attestation evidence. |
| Context poisoning resilience | Registered sources are checked against the poisoning guard before they influence an agent. |
| Egress safety | Context cannot move to a model, MCP server, telemetry sink, public corpus, or external destination without data-class and destination controls. |
| Answer contract | Runtime answers preserve source IDs, hashes, and citations instead of turning retrieved text into hidden authority. |
| Agent-to-agent handoff boundary | Remote-agent handoffs carry task summaries, workflow IDs, source hashes, and approval state, not internal memory or tenant runtime context. |
Why it is acquisition-grade
Enterprise buyers and likely acquirers will not value another prompt library by itself. They will value a control surface that makes agentic AI easier to approve, monitor, and defend. This eval layer is designed to answer diligence questions directly:
- Can SecurityRecipes prove that secure context retrieval is tested?
- Can it show negative controls, not only happy paths?
- Can it produce machine-readable evidence for MCP clients and gateways?
- Can it support customer-specific eval packs without exposing tenant data in the open corpus?
- Can it extend from MCP tool use into agent-to-agent protocols where remote agents are opaque applications?
The open artifact creates trust and distribution. The commercial path is hosted eval replay, customer corpus eval ingestion, model/provider regression tracking, signed eval results, and trust-center exports.
Industry alignment
This layer follows current primary guidance and market movement:
- MCP Authorization Specification for resource indicators, audience validation, OAuth 2.1 security expectations, PKCE, HTTPS, and bounded token use.
- MCP Security Best Practices for confused-deputy prevention, connector scope minimization, SSRF controls, session safety, and auditable mediation.
- OWASP Top 10 for Agentic Applications 2026 for goal hijack, tool misuse, identity abuse, unexpected code execution, memory and context poisoning, inter-agent communication, cascading failure, and containment.
- OWASP MCP Top 10 for model misbinding, context spoofing, prompt-state manipulation, insecure memory references, and covert channels.
- NIST AI RMF and the NIST Generative AI Profile for AI governance, measurement, monitoring, provenance, and data boundary management.
- A2A Enterprise-Ready Features for treating remote agents as opaque enterprise applications with transport security, identity, authorization, and monitoring.
MCP examples
List eval-ready scenarios:
{
"decision": "eval_ready",
"minimum_score": 100
}Inspect one scenario:
{
"scenario_id": "ctx-eval-production-attestation-hold"
}Evaluate an observed answer:
{
"scenario_id": "ctx-eval-vuln-dep-prompt-context",
"answer_text": "Use vulnerable-dependency-remediation context and preserve the source hash.",
"citations": [
{
"source_id": "prompt-library-recipes",
"source_hash": "<hash from the eval pack>",
"path": "content/prompt-library/codex/vulnerable-dep-remediation.md"
}
]
}