Open Trust Benchmark

Public, Verifiable Benchmark Reports

CONTEXA publishes benchmark results as evidence-backed reports with explicit safety gates, metric tables, coverage graphs, and replay-ready lineage.

Published Reports
0

Only internally approved and sanitized benchmark reports are exposed on the public site.

Catalog Health
MISSING

../spring-boot-starter-contexa-enterprise/build/reports/official-verification-fullstack-benchmark

Customer self-run
0

Publication-safe customer result bundles can be reviewed and surfaced without exposing private enterprise evidence.

Safety Gate
N/A

Public reports surface official metric coverage, submission readiness, and failing gate counts.

Published Benchmark Reports

Each report is rendered as a public scorecard with chart-ready artifacts, HTML output, and PDF export.

No published benchmark reports are available yet.
Benchmark Families

Family pages explain how human, agent, protocol, verification, SOAR, and Java production fit are tracked over time.

Top Published Runs

The leaderboard is transparent about source, coverage, official metric pass count, and readiness.

View full leaderboard
Rank Source Report Coverage Official Metrics Ready
Methodology and Certification Snapshot

The public surface explains how CONTEXA keeps the benchmark falsifiable, reviewable, and ready for future standards.

Safety Gates First

CONTEXA benchmark treats permit, lineage, replay, and evidence integrity as mandatory gates before any aggregate score.

  • Unsafe action, broken lineage, or unverifiable replay fails the benchmark regardless of the average score.
  • Public reports expose both aggregate scores and gate failures.
  • The benchmark measures controllable action-plane quality rather than isolated model output.

Human and Agent Unified Semantics

Human requests and delegated agent execution are evaluated under the same canonical security semantics.

  • Human, service-client, and delegated-agent executions are assessed in one request-time control plane.
  • Objective, scope, tool-chain, permit, approval, and protocol-boundary remain common evaluation axes.
  • Public reports expose scenario families and scorecards without leaking private evidence.

Publication-safe Reporting

Public benchmark artifacts are generated from sanitized publication bundles instead of internal raw evidence.

  • Private evidence stays inside contexa-iam-enterprise for operator review.
  • contexa-site reads only publication-approved public artifacts.
  • HTML and PDF reports are generated from the same public summary and chart dataset.
spring-profile BUILDING

Spring Application Profile

Java and Spring application runtime protection, verification, and benchmark publication readiness.

Matching reports: 0

mcp-profile BUILDING

MCP Tool Governance Profile

Protocol-boundary, permit, and tool-execution controls for MCP mediated agent actions.

Matching reports: 0

a2a-profile BUILDING

A2A Delegation Profile

Multi-agent delegation lineage, protocol compatibility, and zero-trust chain integrity.

Matching reports: 0

soar-profile BUILDING

AI Native SOAR Profile

Action-plane safety, permit enforcement, approval, and incident lineage for automated response.

Matching reports: 0