Architecture
A look under the hood at how Overload is structured.
System Overview
Overload is built in Python, leveraging asyncio for high-concurrency network I/O. The project is split into five main subsystems:
- Collection Parser (
collection/): Translates Postman JSON into internal dataclasses. Handles nested folders, auth inheritance, and all body types. - Load Engine (
engine/): Asynchronously dispatches HTTP requests using 10 concurrency patterns. Centralised inengine/service.py— the shared orchestrator used by both the web API and the MCP server. - Reporting (
report/): Aggregates metrics and generates per-run folders with HTML, JSON, CSV, and responses outputs. - Web UI (
web/): A FastAPI server wrapping the engine, communicating with a Vanilla JS SPA via REST and WebSockets. - MCP Server (
mcp_server.py): A stdio MCP server that exposes Overload as tools to LLM-based coding assistants (Claude Code, Codex CLI, GitHub Copilot).
The Async Engine
The core is HttpClient (wrapping httpx.AsyncClient) and the LoadPattern protocol.
Overload uses a single-process asyncio event loop — no threads, no multiprocessing. This allows thousands of concurrent HTTP connections with minimal overhead.
Each load pattern implements:
class LoadPattern(Protocol):
async def execute(
self, client, requests, variables, config,
run_id, cancel_event, on_progress
) -> list[RequestResult]
Patterns use asyncio.Semaphore to cap concurrency and check cancel_event.is_set() to stop cleanly. Progress is emitted at least every 0.5 s via a time-based throttle in _emit_progress().
Cancellation Safety
HttpClient holds a result_sink: list[RequestResult] that is populated by every execute() call, including those still in-flight when a hard asyncio.CancelledError propagates. engine/service.py reads the sink in its except asyncio.CancelledError handler so partial results are never lost — a partial HTML report is always generated.
Per-Run Output Folder
utils/naming.py provides make_run_dir(base, run_id) which creates {base}/run_{run_id}/. report/generator.py writes report.html there; report/responses.py writes responses.json when response bodies were captured. engine/service.py writes meta.json (the history sidecar). On startup, load_run_history() scans reports/run_*/meta.json to restore past runs without any database.
Variable Resolution
VariableContext manages a scope chain: CSV row (if attached) > runtime overrides (--var) > environment file > collection variables. Variables are resolved just-in-time before each HTTP request. VariableContext.derive(row) prepends a CSV-row scope without mutating the base context, making it safe for concurrent requests.
Shared Run Orchestration
engine/service.py is the single entry point for running a test. It is called by both web/routes/api.py (via asyncio.create_task) and mcp_server.py (direct await). This avoids duplicated logic and ensures identical behaviour regardless of the invocation path.
Web Architecture
When you run overload ui, a Uvicorn/FastAPI server starts locally.
- Event Bus (
engine/events.py): Internal pub/sub decouples the async engine from the UI transport layer. Progress ticks are forwarded to all subscribed WebSockets. - WebSocket broadcast (
web/routes/ws.py): Fans outRunProgressupdates to all connected browser clients in real time. - Frontend: Vanilla JavaScript SPA (no React, no build step). Chart.js for data visualisation. State navigated with a simple hash-based router in
app.js.
MCP Server
mcp_server.py uses FastMCP to expose six tools over the stdio MCP protocol. The tools are plain Python functions importable and callable directly in tests — no MCP client required for unit testing. The server shares engine/service.py with the web API.