Temporal for AI agents

A durable execution engine
for AI agents.

Wrap your existing tool-calling loop and get crash-safety, an append-only step ledger, idempotent tool dispatch, and deterministic replay — backed entirely by Postgres. No broker. No rewrite of your agent.

The one guarantee. A worker can die at any point in a run and another resumes from the ledger — no step is lost. Tool dispatch attempts may repeat, but tool effects cannot duplicate when idempotency is enforced.
Stated precisely: at-most-once dispatch from Avatar always; exactly-once end-to-end iff the tool honors the idempotency key. We never claim unconditional exactly-once.

The 30-second proof: a refund that survives a crash

The agent runs lookup_order → issue_refund → email_customer. A worker is killed after issue_refund dispatches but before its observation commits — the decisive crash window. A fresh worker re-leases the run, rebuilds from the Postgres ledger, and re-dispatches with the same idempotency key.

---- timeline ----
  #1  [plan]
  #2  [tool_call] c1        # lookup_order
  #3  [observation] c1
  #4  [plan]
  #5  [tool_call] c2        # issue_refund
  ▸ resumed by host:6226 (attempt 2)   ← crashed here
  #6  [observation] c2      # re-dispatched, deduped
  ...
  #11 [final]
-------------------
run status        : succeeded
dispatch attempts : 2   # physically called twice
tool effects      : 1   # one actual refund
2
dispatch attempts
(may repeat)
1
tool effect
(cannot duplicate)

Run it yourself:

python -m avatar.cli demo

A real worker process is killed with SIGKILL — this is not a mock. Then open the dashboard to see the ▸ resumed after crash divider in the ledger.

Why Avatar exists

LangChain-style frameworks give you the agent loop but no durability — a deploy or a crash mid-run loses state, or double-fires a refund. Temporal gives you durability but no agent / tool / LLM semantics. Avatar is the only thing that is both agent-native and crash-safe, and its only infrastructure dependency is Postgres.

 Crash-safeIdempotent toolsReplayAgent-nativeInfra
LangChain / Agents SDKnononoyes
Temporal / RestateyesDIYyesnocluster
Cloud queues (SQS/Celery)at-least-onceDIYnonobroker
AvataryesyesyesyesPostgres

How it works

The engine is a state machine over two Postgres tables. The runs table is the queue; run_steps is an append-only ledger. All run state is a pure fold over that ledger — which is what makes crash-resume and replay deterministic.

SDK ──enqueue──▶ Postgres ( runs = queue + state · run_steps = append-only ledger ) │ FOR UPDATE SKIP LOCKED — lease + heartbeat ▼ Worker ──▶ plan → tool_call → observe → commit → repeat │ (policy check · idempotency key · subprocess tool) Dashboard ◀──REST + SSE───┘
  1. Enqueue a run — it's a row in runs, status queued.
  2. A worker leases it atomically (FOR UPDATE SKIP LOCKED) and heartbeats the lease — exactly one owner.
  3. Each loop: commit a tool_call intent, dispatch the tool with its idempotency key, then commit the observation.
  4. On crash, the lease expires; another worker rebuilds state from the ledger and continues from the last committed step.
  5. An already-observed call short-circuits to its recorded result; a dispatched-but-unobserved call is re-dispatched with the same key.

What you get

[ ]

Crash-safe

Lease + heartbeat + ledger replay. Kill -9 any worker; another resumes.

{ }

Idempotent tools

A crash-stable key per tool call, enforced by UNIQUE(run_id, key).

Append-only ledger

Every plan / tool_call / observation step is committed before the next. Full audit, free.

Replay & fork

Re-run a trace prefix from any step without re-calling the model or re-running tools.

Policy hook

allow / deny / require_approval evaluated before every tool dispatch.

$

Budget hard-stop

Per-run budget_cap_cents; the run halts atomically before it breaches.

Control API + SSE

REST with a single static key and a live step stream. The dashboard is just a client.

Dashboard

Runs list, step-ledger timeline, live SSE, visible crash-resume markers, fork-here.

Postgres-only

One infra dependency. The runs table is the queue. No Redis, no broker.

Install

Two ways in. The whole stack — Postgres, control API, dashboard, and a worker — comes up with one command.

# Postgres + control API + dashboard + 1 worker
docker compose up

# scale workers horizontally
docker compose up --scale worker=3

Then open the dashboard (host port 8088 in this environment), enqueue a run, and watch its live step timeline. Host ports are overridable via AVATAR_API_HOST_PORT / AVATAR_PG_HOST_PORT.

# from the repo root
pip install -e .

# the crash-resume proof (kills a real worker)
python -m avatar.cli demo

# run the API + dashboard, and a worker
avatar serve
avatar worker          # scale by running more
from avatar import Avatar, tool, Plan, ToolCall

app = Avatar(api_url="http://localhost:8088", api_key="dev-key")

@tool(timeout=10, retries=2)
def issue_refund(order_id, cents):
    # Your real side effect. Forward current_idempotency_key()
    # to the downstream for exactly-once end-to-end.
    return {"refunded": True}

@app.agent("support-resolver")
def resolve(state):
    if any(m["role"] == "tool" for m in state.messages):
        return Plan(final=True, output={"status": "done"})
    return Plan(tool_calls=[ToolCall(id="c1", name="issue_refund",
                                     arguments={"order_id": "42", "cents": 500})])

run = app.runs.create(agent_ref="support-resolver", input={"ticket_id": 42})
print(app.runs.wait(run["id"]))

Point the worker at your module with AVATAR_APP=yourpkg.agents. The engine drives the durable loop; you write only the model call and the tools.

Who it's for

Backend engineers whose AI agents touch real, stateful, money-or-data-changing systems — issuing refunds, provisioning accounts, updating CRM/billing, writing to ledgers — and who have already been burned by an in-memory loop losing state or double-firing a side effect on a deploy or crash. If your agent only chats, you don't need Avatar. If it acts, you do.

Scope

This is single-purpose infrastructure, not a platform. It is not a SaaS, not multi-tenant, not a marketplace, not BYOK, not voice/avatars, not multi-agent orchestration. Those are sequenced behind the wedge — a hosted Avatar Cloud, governance, and a multi-agent fabric — built in the order that actually ships.