Only internally approved and sanitized benchmark reports are exposed on the public site.
../spring-boot-starter-contexa-enterprise/build/reports/official-verification-fullstack-benchmark
Publication-safe customer result bundles can be reviewed and surfaced without exposing private enterprise evidence.
Public reports surface official metric coverage, submission readiness, and failing gate counts.
Each report is rendered as a public scorecard with chart-ready artifacts, HTML output, and PDF export.
Family pages explain how human, agent, protocol, verification, SOAR, and Java production fit are tracked over time.
Human Zero Trust
Request-time human access decisions, explanations, and replay quality.
Agent Zero Trust
Delegated agent lineage, objective, scope, and tool-chain drift controls.
Protocol Boundary
Canonical security fidelity across MCP, A2A, and internal runtime boundaries.
Verification and Assurance
Evidence completeness, replay match rate, and submission readiness.
AI Native SOAR
Approval, permit, tool execution, and incident lineage for AI native action planes.
Java Production Fit
Java and Spring runtime integration readiness for production deployment.
The leaderboard is transparent about source, coverage, official metric pass count, and readiness.
| Rank | Source | Report | Coverage | Official Metrics | Ready |
|---|
The public surface explains how CONTEXA keeps the benchmark falsifiable, reviewable, and ready for future standards.
Safety Gates First
CONTEXA benchmark treats permit, lineage, replay, and evidence integrity as mandatory gates before any aggregate score.
- Unsafe action, broken lineage, or unverifiable replay fails the benchmark regardless of the average score.
- Public reports expose both aggregate scores and gate failures.
- The benchmark measures controllable action-plane quality rather than isolated model output.
Human and Agent Unified Semantics
Human requests and delegated agent execution are evaluated under the same canonical security semantics.
- Human, service-client, and delegated-agent executions are assessed in one request-time control plane.
- Objective, scope, tool-chain, permit, approval, and protocol-boundary remain common evaluation axes.
- Public reports expose scenario families and scorecards without leaking private evidence.
Publication-safe Reporting
Public benchmark artifacts are generated from sanitized publication bundles instead of internal raw evidence.
- Private evidence stays inside contexa-iam-enterprise for operator review.
- contexa-site reads only publication-approved public artifacts.
- HTML and PDF reports are generated from the same public summary and chart dataset.
Spring Application Profile
Java and Spring application runtime protection, verification, and benchmark publication readiness.
Matching reports: 0
MCP Tool Governance Profile
Protocol-boundary, permit, and tool-execution controls for MCP mediated agent actions.
Matching reports: 0
A2A Delegation Profile
Multi-agent delegation lineage, protocol compatibility, and zero-trust chain integrity.
Matching reports: 0
AI Native SOAR Profile
Action-plane safety, permit enforcement, approval, and incident lineage for automated response.
Matching reports: 0