Architecture

A look under the hood at how Overload is structured.

System Overview

Overload is built in Python, leveraging asyncio for high-concurrency network I/O. The project is split into five main subsystems:

  1. Collection Parser (collection/): Translates Postman JSON into internal dataclasses. Handles nested folders, auth inheritance, and all body types.
  2. Load Engine (engine/): Asynchronously dispatches HTTP requests using 10 concurrency patterns. Centralised in engine/service.py — the shared orchestrator used by both the web API and the MCP server.
  3. Reporting (report/): Aggregates metrics and generates per-run folders with HTML, JSON, CSV, and responses outputs.
  4. Web UI (web/): A FastAPI server wrapping the engine, communicating with a Vanilla JS SPA via REST and WebSockets.
  5. MCP Server (mcp_server.py): A stdio MCP server that exposes Overload as tools to LLM-based coding assistants (Claude Code, Codex CLI, GitHub Copilot).

The Async Engine

The core is HttpClient (wrapping httpx.AsyncClient) and the LoadPattern protocol.

Overload uses a single-process asyncio event loop — no threads, no multiprocessing. This allows thousands of concurrent HTTP connections with minimal overhead.

Each load pattern implements:

class LoadPattern(Protocol):
    async def execute(
        self, client, requests, variables, config,
        run_id, cancel_event, on_progress
    ) -> list[RequestResult]

Patterns use asyncio.Semaphore to cap concurrency and check cancel_event.is_set() to stop cleanly. Progress is emitted at least every 0.5 s via a time-based throttle in _emit_progress().

Cancellation Safety

HttpClient holds a result_sink: list[RequestResult] that is populated by every execute() call, including those still in-flight when a hard asyncio.CancelledError propagates. engine/service.py reads the sink in its except asyncio.CancelledError handler so partial results are never lost — a partial HTML report is always generated.

Per-Run Output Folder

utils/naming.py provides make_run_dir(base, run_id) which creates {base}/run_{run_id}/. report/generator.py writes report.html there; report/responses.py writes responses.json when response bodies were captured. engine/service.py writes meta.json (the history sidecar). On startup, load_run_history() scans reports/run_*/meta.json to restore past runs without any database.

Variable Resolution

VariableContext manages a scope chain: CSV row (if attached) > runtime overrides (--var) > environment file > collection variables. Variables are resolved just-in-time before each HTTP request. VariableContext.derive(row) prepends a CSV-row scope without mutating the base context, making it safe for concurrent requests.

Shared Run Orchestration

engine/service.py is the single entry point for running a test. It is called by both web/routes/api.py (via asyncio.create_task) and mcp_server.py (direct await). This avoids duplicated logic and ensures identical behaviour regardless of the invocation path.

Web Architecture

When you run overload ui, a Uvicorn/FastAPI server starts locally.

MCP Server

mcp_server.py uses FastMCP to expose six tools over the stdio MCP protocol. The tools are plain Python functions importable and callable directly in tests — no MCP client required for unit testing. The server shares engine/service.py with the web API.