Record every decision, tool call, and failure. Replay them later. Zero dependencies, one decorator.
Not because the agents are wrong — but because they're invisible. You can't debug what you can't see.
When an agent fails, you have no idea which tool call went wrong or what the LLM was thinking.
Token costs accumulate silently. You only see the damage at the end of the month.
Flaky failures that only happen in production, with no way to replay the exact sequence of events.
Existing tools (LangSmith, W&B) require you to commit to one framework. AgentBlackBox works with any.
Drop it on any function. Existing code is unchanged.
BEFORE
def run_agent(task: str): result = client.messages.create( model="claude-sonnet-4-6", messages=[{"role": "user", "content": task}] ) return result.content[0].text
AFTER — full recording
from agentblackbox import BlackBox @BlackBox.record(agent_name="researcher") def run_agent(task: str): result = client.messages.create( model="claude-sonnet-4-6", messages=[{"role": "user", "content": task}] ) return result.content[0].text
AUTO-INSTRUMENT WITH PATCH (zero code changes)
from agentblackbox import patch_anthropic, BlackBox patch_anthropic() # patches ALL anthropic calls globally with BlackBox.session("my-agent") as bb: run_my_existing_agent() # no changes needed bb.replay() # see the full timeline
Built for developers who ship real AI agents to production.
Replay any session in your terminal with timestamps, token counts, costs, and full input/output.
Auto-calculates cost per session, per model, per agent. Supports 20+ models from OpenAI and Anthropic.
Every tool invocation is captured: name, arguments, result, duration, and errors.
Local dashboard with session list, timeline detail view, and cost analytics charts.
Run a shared cloud dashboard — remote agents POST recordings in real-time via API key auth.
Core library has no dependencies. Pure Python 3.10+. SQLite storage. Nothing to configure.
Auto-instrumentation — no code changes in your agent logic.
# Anthropic from agentblackbox import patch_anthropic patch_anthropic() # OpenAI Agents SDK from agentblackbox.integrations import patch_openai_agents patch_openai_agents() # LangChain from agentblackbox import BlackBoxCallbackHandler handler = BlackBoxCallbackHandler() chain.invoke(input, config={"callbacks": [handler]})
Launch a shared dashboard. Agents in production push recordings in real-time.
# 1. Start cloud dashboard (generates API key) agentblackbox dashboard --cloud --port 8765 # 2. In your agents — use RemoteStorage from agentblackbox.remote import RemoteStorage from agentblackbox import BlackBox store = RemoteStorage( api_key="abx_...", endpoint="https://your-dashboard.example.com", ) bb = BlackBox.session("prod-agent") bb._storage = store bb.start()
Open source core. Paid cloud hosting when you need it.
Zero config. Zero dependencies. One pip install.