Agent Certification Lab

Touchstone

Certify AI engineering agents for modernization roles with repeatable evidence, scorecards, and human review.

Plan: Loading Client: Loading Deployment: Loading Data: Loading
Leaderboards →
Agent Roles
Select the work types to certify.
0 selected
Models
Choose candidates from the active spec or add one.
0 selected
Evaluation Settings
Runtime settings for the next certification run.
Stack Fitment
Project-intent match across models, frameworks, retrieval, lineage, sources, runtime, and controls.
Not Loaded
--
Overall Stack Fitment
Band: -- Intent: --

Fitment data has not loaded.

Recommended Run Stack
Highest ranked model and framework from the active spec.

No recommendations loaded.

Open Stack Gaps
Cautions from components that scored below the target band.

No gaps loaded.

External Connections
Enterprise sources declared for this lab.
Pending

No external sources connected.

Artifacts
Files available to agent tasks.
Artifact Viewer
Read-only source context.
Select an artifact to view its contents.
Connect New Source
Ingest artifacts from enterprise storage.
Live Evaluation Arena
Streaming model results and judge scores.
Idle
Waiting...0%

Start an evaluation to see live results.

Certification Cards
Role fitness, safety gates, and review status.
Certified Reference Architecture
Platform
Velocity
Modernization
Reliability
Industry

Certifications appear after evaluation completes.

Role-Fitness Leaderboard
Sorted by overall certification score.
RankModelRoleOverallAccuracyHallucinationEvidenceFitness
Run an evaluation first.
Evaluation History
Previous runs stored in the local benchmark database.

No runs yet.

Efficiency Index (Accuracy vs Cost)
Cost Breakdown (USD)
Modernization ROI
Projected annual savings vs. baseline.

Select a run to project ROI.

Industry Efficiency Heatmap
Role-Industry fitness cross-sections.

Cross-sectional data not yet aggregated.

AST Parser Comparison
Reverse engineering fidelity across different parsing strategies.
Parser TypeNodesEdgesLatency (ms)Coverage
Run a lineage extraction to compare.
Knowledge Graph & Lineage
Lineage extracted from evaluated artifacts and business rules.
Parser-Seeded

Select a completed run to view lineage.

Global Skill Registry
Explore and select pre-built agents from the Touchstone marketplace.

Audit Dashboard

Live inference auditing and cost-per-outcome analysis.

Total Cost (MTD)
$0.00
Avg Incumbent Score
0%
Matched Invocations
0
Audit Ledger Integrity Verified

Hash-chain verified at --:--

Recent Audit Invocations
Timestamp Task Model Score Cost
No recent audit data. Click 'Run Full Audit' to ingest logs.