Agent Roles
Select the work types to certify.
Models
Choose candidates from the active spec or add one.
Evaluation Settings
Runtime settings for the next certification run.
Stack Fitment
Project-intent match across models, frameworks, retrieval, lineage, sources, runtime, and controls.
--
Overall Stack Fitment
Fitment data has not loaded.
Recommended Run Stack
Highest ranked model and framework from the active spec.
No recommendations loaded.
Open Stack Gaps
Cautions from components that scored below the target band.
No gaps loaded.
External Connections
Enterprise sources declared for this lab.
No external sources connected.
Artifacts
Files available to agent tasks.
Artifact Viewer
Read-only source context.
Select an artifact to view its contents.
Connect New Source
Ingest artifacts from enterprise storage.
Live Evaluation Arena
Streaming model results and judge scores.
Waiting...0%
Start an evaluation to see live results.
Certification Cards
Role fitness, safety gates, and review status.
Certified Reference Architecture
Platform
Velocity
Modernization
Reliability
Industry
Certifications appear after evaluation completes.
Role-Fitness Leaderboard
Sorted by overall certification score.
| Rank | Model | Role | Overall | Accuracy | Hallucination | Evidence | Fitness |
|---|---|---|---|---|---|---|---|
| Run an evaluation first. | |||||||
Evaluation History
Previous runs stored in the local benchmark database.
No runs yet.
Efficiency Index (Accuracy vs Cost)
Cost Breakdown (USD)
Modernization ROI
Projected annual savings vs. baseline.
Select a run to project ROI.
Industry Efficiency Heatmap
Role-Industry fitness cross-sections.
Cross-sectional data not yet aggregated.
AST Parser Comparison
Reverse engineering fidelity across different parsing strategies.
| Parser Type | Nodes | Edges | Latency (ms) | Coverage |
|---|---|---|---|---|
| Run a lineage extraction to compare. | ||||
Knowledge Graph & Lineage
Lineage extracted from evaluated artifacts and business rules.
Global Skill Registry
Explore and select pre-built agents from the Touchstone marketplace.
Audit Dashboard
Live inference auditing and cost-per-outcome analysis.
Total Cost (MTD)
$0.00
Avg Incumbent Score
0%
Matched Invocations
0
Audit Ledger Integrity Verified
Hash-chain verified at --:--
Recent Audit Invocations
| Timestamp | Task | Model | Score | Cost |
|---|---|---|---|---|
| No recent audit data. Click 'Run Full Audit' to ingest logs. | ||||