Overview
Data Flow
Protocol
Adapters
Components
Example Flow
Architecture Overview
High-level view of ATP Platform and its interactions with external systems.
C4Context
title ATP Platform - System Context
Person(user, "Developer", "Creates test suites and reviews results")
Person(ci, "CI/CD System", "Automated testing pipeline")
System_Boundary(atp, "ATP Platform") {
System(cli, "ATP CLI", "Command-line interface for running tests")
System(dashboard, "Web Dashboard", "UI for viewing results and managing tests")
System(runner, "Test Runner", "Orchestrates test execution")
SystemDb(db, "SQLite DB", "Stores results and definitions")
}
System_Ext(agent, "AI Agent", "Agent being tested (black box)")
System_Ext(testsite, "Test Site", "Target system for agent tasks")
System_Ext(llm, "LLM Provider", "For LLM-as-Judge evaluation")
Rel(user, cli, "Runs tests", "YAML config")
Rel(user, dashboard, "Views results", "HTTP")
Rel(ci, cli, "Automated tests", "CLI")
Rel(cli, runner, "Executes")
Rel(runner, agent, "ATP Protocol", "stdin/stdout or HTTP")
Rel(agent, testsite, "Performs tasks", "HTTP")
Rel(runner, db, "Stores results")
Rel(dashboard, db, "Reads data")
Rel(runner, llm, "LLM evaluation", "API")
Key Principle
Agent = Black Box with Contract
ATP doesn't care how the agent is implemented internally
Agents communicate via a standardized protocol (ATPRequest/ATPResponse)
Supports any framework: LangGraph, CrewAI, AutoGen, custom code
Data Flow
How test definitions flow through the system to produce evaluation results.
flowchart LR
subgraph Input
YAML[("Test Suite (YAML)")]
end
subgraph "ATP Core"
Loader["Loader 📄"]
Runner["Runner ⚡"]
Sandbox["Sandbox 🔒"]
end
subgraph Adapters
CLI["CLI Adapter stdin/stdout"]
HTTP["HTTP Adapter REST API"]
Docker["Docker Adapter Container"]
end
subgraph "Agent (Black Box)"
Agent["AI Agent 🤖"]
end
subgraph Evaluation
Evaluators["Evaluators ✓"]
Scoring["Score Aggregator"]
end
subgraph Output
Report["Report 📊"]
DB[("Database")]
end
YAML --> Loader
Loader --> Runner
Runner --> Sandbox
Sandbox --> CLI & HTTP & Docker
CLI & HTTP & Docker --> Agent
Agent --> |"ATPResponse + Events"| Evaluators
Evaluators --> Scoring
Scoring --> Report
Scoring --> DB
style YAML fill:#E0E7FF,stroke:#4F46E5
style Agent fill:#DCFCE7,stroke:#10B981
style Report fill:#FEF3C7,stroke:#F59E0B
style DB fill:#FCE7F3,stroke:#EC4899
Events Stream (stderr)
progress - Task progress updates
tool_call - Tool invocations
llm_request - LLM API calls
reasoning - Agent thoughts
error - Error notifications
Evaluator Types
artifact File existence, schema validation
behavior Tool usage, error handling
llm_judge Quality assessment by LLM
code_exec pytest, npm, lint checks
Protocol Messages
The ATP Protocol defines the contract between the platform and agents.
classDiagram
class ATPRequest {
+String version
+String task_id
+Task task
+Constraints constraints
+Context context
}
class Task {
+String description
+List expected_artifacts
+Object metadata
}
class Constraints {
+int max_steps
+int timeout_seconds
+List allowed_tools
+List forbidden_tools
}
class Context {
+Object environment
+List artifacts
+String working_directory
}
class ATPResponse {
+String version
+String task_id
+Status status
+List artifacts
+Metrics metrics
+String error
}
class Artifact {
+String type
+String path
+String content
+String content_type
}
class Metrics {
+int total_steps
+float wall_time_seconds
+int tokens_used
+float cost
}
class ATPEvent {
+String version
+String task_id
+String timestamp
+int sequence
+String event_type
+Object payload
}
class Status {
<<enumeration>>
completed
partial
failed
timeout
}
ATPRequest --> Task
ATPRequest --> Constraints
ATPRequest --> Context
ATPResponse --> Artifact
ATPResponse --> Metrics
ATPResponse --> Status
Context --> Artifact
Communication Pattern
# CLI Adapter (stdin/stdout)
echo '{"task_id": "t1", "task": {"description": "Find laptops"}}' | python agent.py
# HTTP Adapter (REST API)
curl -X POST http://agent:8000/execute -d '{"task_id": "t1", ...}'
# Events stream on stderr (JSONL)
{"event_type": "progress", "payload": {"message": "Starting search", "percentage": 0}}
{"event_type": "tool_call", "payload": {"tool": "http_get", "status": "completed"}}
Adapter Types
Adapters translate between ATP Protocol and specific agent implementations.
flowchart TB
subgraph "ATP Runner"
Runner["Test Runner"]
end
subgraph "Built-in Adapters"
CLI["CLI Adapterstdin/stdout JSON "]
HTTP["HTTP AdapterREST API calls "]
Docker["Docker AdapterContainer execution "]
end
subgraph "Framework Adapters"
LG["LangGraphState graphs "]
CA["CrewAIMulti-agent crews "]
AG["AutoGenConversational agents "]
end
subgraph "Agent Implementations"
PyAgent["Python Script 🐍"]
Container["Docker Container 🐳"]
Service["HTTP Service 🌐"]
LGAgent["LangGraph Agent 📊"]
CrewAgent["CrewAI Crew 👥"]
AGAgent["AutoGen Agent 💬"]
end
Runner --> CLI & HTTP & Docker
Runner --> LG & CA & AG
CLI --> PyAgent
Docker --> Container
HTTP --> Service
LG --> LGAgent
CA --> CrewAgent
AG --> AGAgent
style Runner fill:#4F46E5,color:#fff
style CLI fill:#E0E7FF,stroke:#4F46E5
style HTTP fill:#E0E7FF,stroke:#4F46E5
style Docker fill:#E0E7FF,stroke:#4F46E5
style LG fill:#DCFCE7,stroke:#10B981
style CA fill:#DCFCE7,stroke:#10B981
style AG fill:#DCFCE7,stroke:#10B981
CLI Adapter Usage
agents:
- name: "search-agent"
type: "cli"
config:
command: "python"
args: ["agent.py"]
environment:
TEST_SITE_URL: "http://localhost:9876"
Docker Adapter Usage
agents:
- name: "containerized-agent"
type: "docker"
config:
image: "atp-search-agent:latest"
network: "host"
environment:
TEST_SITE_URL: "http://localhost:9876"
Component Structure
Internal modules of the ATP Platform.
flowchart TB
subgraph "atp/"
subgraph "Core Modules"
protocol["protocol/Request, Response, Event models "]
core["core/Config, exceptions, security "]
end
subgraph "Execution"
loader["loader/YAML/JSON parsing "]
runner["runner/Test orchestration, sandbox "]
adapters["adapters/CLI, HTTP, Docker, frameworks "]
end
subgraph "Evaluation"
evaluators["evaluators/Artifact, behavior, LLM-judge "]
scoring["scoring/Score aggregation "]
baseline["baseline/Regression detection "]
end
subgraph "Output"
reporters["reporters/Console, JSON, HTML, JUnit "]
dashboard["dashboard/FastAPI web interface "]
end
subgraph "Infrastructure"
streaming["streaming/Event buffering "]
statistics["statistics/Mean, CI, stability "]
cli_mod["cli/CLI entry point "]
end
end
cli_mod --> loader
loader --> runner
runner --> adapters
adapters --> protocol
runner --> evaluators
evaluators --> scoring
scoring --> baseline
scoring --> reporters
scoring --> dashboard
runner --> streaming
baseline --> statistics
style protocol fill:#E0E7FF,stroke:#4F46E5
style runner fill:#DCFCE7,stroke:#10B981
style evaluators fill:#FEF3C7,stroke:#F59E0B
style dashboard fill:#FCE7F3,stroke:#EC4899
Evaluator Registry
Evaluator
Assertion Types
artifact
artifact_exists, contains, schema, sections
behavior
must_use_tools, max_tool_calls, no_errors, forbidden_tools
llm_judge
llm_eval
code_exec
pytest, npm, custom_command, lint
security
security
Example: Web Search Agent Test
Complete flow of testing the search agent against the test site.
sequenceDiagram
autonumber
participant User
participant CLI as ATP CLI
participant Runner
participant Adapter as CLI Adapter
participant Agent as Search Agent
participant Site as Test Site (port 9876)
participant Eval as Evaluators
participant DB as Dashboard DB
User->>CLI: atp test web_search.yaml
CLI->>Runner: Load test suite
loop For each test
Runner->>Adapter: Execute test
Adapter->>Agent: ATPRequest (stdin)
Agent->>Site: GET /api/products?category=laptop
Site-->>Agent: JSON product list
Agent-->>Adapter: ATPEvent (stderr): progress
Agent-->>Adapter: ATPEvent (stderr): tool_call
Agent-->>Adapter: ATPResponse (stdout)
Adapter-->>Runner: Response + Events
Runner->>Eval: Evaluate results
Eval-->>Runner: Assertion results
end
Runner->>DB: Store results
Runner-->>CLI: Test report
CLI-->>User: Summary + Details
Note over User,DB: Results visible in Dashboard at localhost:8080
Test Site (port 9876)
/ - Homepage
/catalog - Product catalog (HTML)
/api/products - Products API (JSON)
/api/company - Company info
/about, /contact - Static pages
podman run -d -p 9876:9876 atp-test-site
Search Agent
Parses task description for parameters
Fetches data via HTTP (JSON API or HTML scraping)
Emits progress events to stderr
Returns structured results as artifacts
podman run -i --network=host atp-search-agent