Loopy — Deployment Architecture

The concrete pieces of a running Loopy, how they wire together, and what changes from laptop to production.

A companion to ARCHITECTURE.md. That doc explains the design — the compile→manifest→runtime split, the swappable modules, the phased build. This one explains a deployment: the live pieces, the request path, and the topologies — grounded in what loopy run actually stands up today.
⚠️ Ingress status — read first. Both sensor models run today. loopy run schedules poll and cron triggers on an in-process scheduler (one watermark-gated task per sensor) and hosts webhook sensors as HTTP routes, fanning one URL out to every sensor on it. Webhook ingress can be signed: for /hooks/github paths, loopy run verifies GitHub's X-Hub-Signature-256 HMAC at the edge when GITHUB_WEBHOOK_SECRET is set (a path left without a secret runs unverified — dev only — and says so loudly). The remaining gap is durability, not the trigger type: the scheduler is in-process, so restart-survival and single-firing across workers are still ahead (ARCHITECTURE B7/B8).

1. The two lifecycles

A Loopy deployment has a build-time path and a run-time path, joined by one artifact: the manifest.

Author the project
registry.yml · workflows/*.md
skills/ · sensors/*.py
loopy compile
manifest.json
validated IR
(the deploy artifact)
loopy run
The Loopy server
hosts sensors,
drives workflow runs

build time · CI / laptop · pure, no runtime deps

run time · a long-lived server

The deploy unit is therefore manifest.json + the project root — the root is still needed at run time for two things only: the sensor module source and the sandbox env_files.

2. Anatomy of a successful deployment

The Loopy server in the middle, the outside world it integrates with on the edges. The agent never calls the model from the Runtime process — the AgentHarness shells out to claude inside the Sandbox, so model calls and tool side effects originate from the Sandbox and are governed by its network: allowlist.

The Loopy server · loopy run
SensorRunner (FastAPI)
hosts @sensor(webhook=…) routes · raw payload → registered Event
Event
EventReceiver
transport-neutral intake · injects the event into the Runtime
EventBus
routes registered events to every on: subscriber — fan-out + loop-backs
Runtime  — the engine
instantiate a run at the on: step · walk the after: DAG · render {{ event.* }}/{{ step.* }} · record outputs + history · publish emits: back to the bus ↺
per step
AgentHarness (claude-code)
runs claude -p inside the sandbox · validates typed output:/emits: · enforces budget
SandboxProvider
local | daytona — provisions compute + egress · the trust boundary
Sources
Sentry · Linear · Datadog
PagerDuty · Slack
webhook POST /hooks
Loopy server
(the stack above)
from the sandbox
Targets
Anthropic API (models)
GitHub · CI · feature flags

egress from the sandbox is gated by the network: allowlist

The pieces, and what ships today

PieceResponsibilityToday (loopy run)
SensorRunnerHosts each @sensor(webhook=…) as an HTTP route; runs the author's fn to turn a raw vendor payload into a registered Event. The webhook/push edge — one language-pluggable surface.FastAPISensorRunner on uvicorn; loads the real sensor module, or synthesizes events if it can't load. Webhook-only — poll sensors run on the PollScheduler (below). Started only when webhook sensors exist.
PollSchedulerThe timer/pull edge. Fires each @sensor(poll=…) on its interval with a Tick (scheduled_at, last_run), normalizes the returned event(s), and delivers them to the EventReceiver — same output as a webhook, different trigger.In-process asyncio (one task/sensor, sequential, watermark advances only on success, cold-start scans one window). Behind a Scheduler seam; durable timing is B7.
EventReceiverTransport-neutral intake — accepts an Event from any sensor source, re-validates it against the registry, and publishes it to the EventBus. Lets a non-Python sensor feed the Python Runtime.LocalEventReceiver (in-proc; publish-and-ack — validates then bus.publish, does not run the workflow; the Runtime drains the bus separately).
EventBusRoutes registered events to every workflow subscribing via on:. Handles fan-out (one event → many workflows) and loop-backs (emits: re-enters the EventBus).InProcessEventBus (single process), or RedisEventBus (Redis Streams + consumer group — durable, at-least-once) with --bus redis.
RuntimeThe engine. Instantiates a run at the on: step, walks the after: DAG, renders templates, records outputs + event-sourced history, publishes emits:.InMemoryRuntime — covers B1–B6; single-process, non-durable. Durable timers / cron / resume stubbed (B7/B10).
AgentHarnessRuns a step's prose against its agent (model + skills); validates against the typed output:/emits:.ClaudeCodeHarness — runs headless claude -p … --output-format json inside the sandbox, parses the envelope, feeds total_cost_usd to the budget enforcer.
SandboxProviderProvisions compute + egress from the sandbox spec (image build + network: allowlist). The trust boundary.local (subprocess, dev) or daytona (isolated cloud container). Selected per-sandbox via provider: in registry.yml (required on every sandbox — a missing one is compile error E214); the runtime routes each step to the backend its sandbox names. No launch-time flag. loopy init scaffolds daytona.
SecretsResolves a sandbox's env_file(s) at run time and injects them into the sandbox. Never in the manifest, never logged.EnvFileSecretsResolver — reads dotenv relative to the project root; refuses paths that escape it.
Sensor secretsSupplies credentials to in-process @sensor functions (poll + webhook).A single runner-wide sensors/.env (load_sensor_env) merged into the process env at loopy run, so sensors read them via os.environ. Optional, gitignored, never in the manifest.
StateStoreRun history (event-sourced), step outputs, poll watermarks (last_run), event-id dedupe.InMemoryStateStore — process-lifetime only. Now actively used: poll watermarks and the Redis bus's at-least-once dedupe.
RetryPolicyWraps side-effecting calls with backoff + idempotency key (run_id:step_id) so retries/replays don't double-fire.ExponentialBackoffRetry; budget trips are terminal.

3. Compute topology: the physical boundaries

Section 2 is the logical pipeline — the order data flows through. This is the physical one: which OS process and trust domain each piece runs in. Architecturally there are five tiers, and the EventBus is the seam — everything above it is ingress, everything below it is execution, and the two halves talk only through it.

Terms — one canonical name per piece (the Protocol name from the code). Role words (sensor surface, ingress, seam, engine) are descriptions; the back-ticked component is the term used throughout. "Broker" means one specific thing: a networked EventBus (Redis/NATS/Kafka), as opposed to the in-process one.
TierComponentRole
1SensorRunnerthe sensor surface — hosts developer @sensor code; produces events
2EventReceiverthe ingress gateway — authenticates + re-validates, then publishes
3EventBusthe seam — routes events to on: subscribers; in-process, or a broker when networked
4Runtimethe engine — instantiates and drives runs; uses StateStore + RetryPolicy
(4)AgentHarnessruns a step's agent; straddles RuntimeSandbox
5Sandbox (via SandboxProvider)the agent's exec domain — provisioned per spec
THE INTERNET — your sources Sentry · Linear · Datadog · Slack
│  HTTPS POST /hooks/…  ▼
Tier 1 · SensorRunner — the sensor surface own process domain · UNTRUSTED
developer-authored · language-pluggable (Python host today, or your own app via the SDK)
an @sensor fn turns a raw vendor payload into a candidate Event, then hands it off — it never touches the EventBus or the Runtime directly
│  deliver Event   (in-proc call  |  HTTPS POST /events when split out)  ▼
Tier 2 · EventReceiver — ingress gateway loopy-owned · TRUSTED
authenticate the producer · re-validate the event against the registry contract · publish to the EventBus
the backend's front door — runs no workflows
▼   T H E   S E A M   ▼
Tier 3 · EventBus — the seam modular · the only ingress↔execution boundary
routes registered events to every on: subscriber · fan-out · loop-backs (a step's emits re-enters here ↺)
in-process (single node)  |  a brokerRedis / NATS / Kafka — swap without touching any tier above or below
│  subscribe / consume  ▼
Tier 4 · Runtime — the engine own process domain · N workers when distributed
instantiate a run at the on: step · walk the after: DAG · render templates · record history · publish emits back to the EventBus ↺
AgentHarness orchestration — render the prompt, build the claude argv, parse + validate the result, enforce the budget
│  Sandbox.exec(argv)  →  subprocess spawn (local)  |  control-plane RPC (daytona)  ▼
Tier 5 · Sandbox always a separate exec domain, even on a laptop
the claude CLI runs HERE — model API calls and every tool side effect (git push, open_pr) originate HERE · local subprocess or daytona remote container
│  HTTPS egress, gated by the Sandbox network: allowlist  ▼
THE INTERNET — your targets Anthropic API · GitHub · CI · feature flags

Five facts this makes explicit — and that the logical pipeline leaves ambiguous:

Recommendation: run the EventReceiver as its own small service, separate from the engine. Give it one job — take the event, check it's valid, put it on the EventBus, done. The Runtime reads events off the bus on its own.

Why not bundle it into the engine? If they share a process, then while the engine is busy or restarting you stop accepting events from your sensors. Keep them separate and the sensors keep delivering, the bus holds the backlog, and the engine catches up — and you can run several engine workers behind one receiver.

The one change that makes this possible: today receive() runs the whole workflow and returns a RunId. Change it to just publish the event and return. That's the whole difference between "the receiver is glued to the engine" and "the receiver is its own thing."

What to do right now: for single-node loopy run, leave everything in one process — it's the simple starting point and it's fine for dev and small deployments. The separate service is the production target, not day-one work. (It is never inside the SensorRunner — untrusted — or inside the broker — Redis/NATS run no loopy code.)

What crosses each boundary

BoundaryMechanismOver the network?
Source → SensorRunnerHTTPS POST /hooks/…yes — from the internet
SensorRunnerEventReceiverin-proc call (1 node) or HTTPS POST /events (SDK)only when sensors are split out
EventReceiverEventBusRuntimeEventBus.publish / subscribeonly here is the seam — in-process, or a broker
RuntimeSandbox (exec)subprocess spawn (local) / control-plane RPC (daytona)only daytona
Sandbox → model API, GitHub, …HTTPS egress, network:-gatedyes — from the Sandbox

3.1  Physical topology — today, and where we're going

The tiers are logical; how many processes they occupy is a deploy choice. The direction is a service-oriented architecture: the EventReceiver and the Runtime become separate services that talk through a broker. For now we run them as one node — the simple starting point — and evolve outward. Switching modes changes only which EventBus you wire in; no .md changes, no recompile.

Receiver + RuntimeEventBus (the seam)Sandbox
Single-node — today
(loopy run)
one node, one processin-processseparate (always)
Service-oriented — the directionseparate services; N Runtime workers behind one EventReceivera broker (Redis / NATS / Kafka)separate (always)

Two things hold in both modes:

3.2  Staying out of the corner

Today sensors are loopy-hosted in Python (the simple start). The plan is to expand to developer-hosted sensors later — the dev's own app/language posting events to a loopy-owned endpoint — which is what unlocks polyglot. Getting there cleanly depends on two behaviors, not on extra machinery. The structure is already right (receive() takes a serializable Event, the @sensor fn is a standalone payload → Event callable, the contract is generated as static files), so the only corner risk is shipping the in-process version with the wrong behaviors and calling the interface "ready to split." Two things land now:

  1. The EventReceiver re-validates every event against the manifest registry — even though in-process the sensor "should" be correct. Skipping it bakes in sensor-trust; un-trusting it later means adding the gate and auditing everything downstream.
  2. receive() publishes and acknowledges — it does not run the workflow synchronously. A remote receiver can't hold a connection open for a minutes-to-days run, so synchronous-run-from-receive must never be depended on. The Runtime consumes off the bus instead.

Deliberately deferred to the developer-hosted milestone (additive, easy to get wrong if built speculatively): the HTTP POST /events endpoint, producer authentication, an external broker, and contract versioning/distribution to remote sensors. The one rule while they're deferred — keep Event serializable and never assume the sensor and receiver share an in-memory registry. (Tracked in plans/future/sensor-ingress/.)

4. The request path of one incident

The README's incidents example, traced through a live deployment:

1. Sentry fires a webhook            ── POST /hooks/sentry ──▶  SensorRunner
2. the @sensor fn maps the payload   ── returns Incident(source=sentry, …) ──▶  EventReceiver
3. EventReceiver injects it          ── Runtime.trigger(Incident) ──▶  EventBus.publish
4. EventBus routes Incident          ── matches triage/investigate `on: Incident` ──▶  a run starts
5. investigate runs                  ── ClaudeCodeHarness runs `claude` in a sandbox ──▶  emits WorkItem
6. WorkItem re-enters the bus        ── matches resolve/arbitrate `on: WorkItem` ──▶  a new run starts
7. arbitrate ▶ fix ▶ review ▶ ship   ── one workflow, `after:` chain, outputs passed by reference
8. ship emits GoalShipped            ── terminal announcement on the bus

Steps 4 and 6 are cross-workflow event seams — they go through the bus. The arbitrate → fix → review → ship chain in step 7 is within-workflow: those handoffs are outputs passed by reference ({{ fix.diff }}) and never touch the bus. Events cross workflow boundaries; outputs stay inside one.

5. Deployment topologies

The same five-axis composition (Runtime · StateStore · EventBus · Harness · SandboxProvider) is wired differently per environment. Only the wiring changes — no .md and no compile step is touched.

A. Laptop / CI ships today

loopy trigger … (sandboxes with provider: local)

InMemoryRuntime · in-process EventBus · in-memory StateStore · local Sandbox.

B. Single-node server ships today

loopy run … (sandboxes with provider: daytona)

Same composition, long-lived, hosting the webhooks; agents in isolated Daytona sandboxes.

C. Durable / distributed design-complete

Swap Runtime → DBOS / Temporal; StateStore → Postgres / history. (The networked bus already ships — --bus redis, Redis Streams.)

Harness + sandbox unchanged.

loopy.yaml — deployment defaults for loopy run

The run-time wiring choices map to an optional config file so they need not be retyped as flags. Absent the file, defaults apply unchanged.

sensor_server:        # the host:port that binds the sensor-webhook listener
  host: 0.0.0.0
  port: 8000
bus: redis            # inproc (single-process) | redis (networked broker)

Precedence is explicit flag > loopy.yaml > built-in default, so --bus inproc still wins over a file that says redis. Connection strings stay in the environment, never the file: bus: redis reads its URL from the REDIS_URL env var (or the --redis-url flag), defaulting to redis://localhost:6379. sandbox is not a config key and not a launch flag — each sandbox's provider: is declared in registry.yml, and the runtime routes each step to the backend its sandbox names. state: (durable StateStore) and limits: (spend caps) are reserved for B10/B-cost and not yet read.

loopy.env — the secret companion to loopy.yaml

Because connection strings and provider keys can't live in the YAML, a local-dev convenience file supplies them: loopy run reads loopy.env at the project root and merges it into the process env with non-override (a value already set in the real/platform environment always wins), before resolving redis_url and before any Daytona client is created. Infra creds only — REDIS_URL, DAYTONA_API_KEY/DAYTONA_API_URL — gitignored; agent secrets stay in sandbox env_files and sensor secrets in sensors/.env. In production you typically skip the file and inject these from the platform's secret store, which the non-override semantics respect.

# loopy.env (project root; gitignored; local-dev convenience)
REDIS_URL=redis://localhost:6379
DAYTONA_API_KEY=dt-...
DAYTONA_API_URL=https://...
Invisible vs. leaks: the code is swappable across all three; the operational shape is not. A → "nothing extra"; B → "a Daytona account"; C → "bring a Postgres" or "run a Temporal cluster." Whoever deploys sees that difference.

6. The deployment checklist

What you need in place for a successful run, in order:

Failure modes worth knowing:

7. One-glance summary

Author  — registry · workflows · skills · sensors  loopy compile  manifest.json
▼  loopy run consumes the manifest  ▼
Ingress  — SensorRunner (untrusted)EventReceiver (trusted gate: authenticate + re-validate)
▼  EventBus.publish  ══ THE SEAM · EventBus (in-process | broker: Redis/NATS) ══  subscribe ▼
Execution  — Runtime walks the DAG · emits loop back to the EventBus ↺ · AgentHarness orchestrates each step
▼  Sandbox.exec  ▼
Sandbox  — the claude CLI runs here · model API + tool egress gated by network: · secrets injected at run time

The EventBus is the seam: ingress (SensorRunner → trusted EventReceiver) on one side, execution (RuntimeAgentHarness) on the other, talking only through it. The Sandbox is the other hard boundary — where trust and egress are enforced. The manifest is the contract between author and Runtime; the Sandbox is the contract between agent and the outside world.