Venture Service Catalog (v1)
Overview
This document catalogs every service in the Venture system: name, port, language, dependencies, health check, scaling notes, and environment contracts. Use this as the canonical reference for service deployment, inter-service communication, and orchestration.
Service Registry
1. control-plane-api (Port 8000)
Language: Python 3.14 (FastAPI)
Responsibility: Founder-facing REST + WebSocket API; task submission, policy publishing, system controls (freeze, kill-switch, rollback).
Dependencies:
- policy-engine (:8001) — validate task intents
- venture-orchestrator (:8005) — dispatch tasks
- ledger-db (:5432, PostgreSQL) — query workflows, audit trail
- redis-cache (:6379) — session store, rate limiting
Scaling: 1 replica (founder ops bottleneck; could scale to 2 with sticky sessions).
Health Check:
- Endpoint:
GET /health - Response:
{ "status": "ok", "timestamp": "...", "dependencies": { "policy-engine": "ok", ... } } - Timeout: 5s
- Interval: 10s
Startup Order: After policy-engine, venture-orchestrator, ledger-db, redis-cache.
Environment Variables:
CONTROL_PLANE_HOST=0.0.0.0
CONTROL_PLANE_PORT=8000
CONTROL_PLANE_LOG_LEVEL=INFO
CONTROL_PLANE_REQUEST_TIMEOUT_SECONDS=30
CONTROL_PLANE_CORS_ORIGINS=* # restrict in production
POLICY_ENGINE_URL=http://policy-engine:8001
VENTURE_ORCHESTRATOR_URL=http://venture-orchestrator:8005
LEDGER_DB_URL=postgresql://user:pass@ledger-db:5432/venture
REDIS_URL=redis://redis-cache:6379/0
FOUNDER_AUTH_MODE=mTLS # or bearer-tokenProcess-compose Config:
services:
control-plane-api:
command: python -m uvicorn control_plane.api:app --host 0.0.0.0 --port 8000 --log-level info
environment:
CONTROL_PLANE_PORT: 8000
POLICY_ENGINE_URL: http://policy-engine:8001
# ... (other env vars)
depends_on:
- policy-engine
- venture-orchestrator
- ledger-db
- redis-cacheEndpoints (representative):
POST /workflows— submit task intentGET /workflows— list active workflowsGET /workflows/{id}— workflow detail + progressPOST /workflows/{id}/cancel— cancel workstreamPOST /control/freeze— global pausePOST /control/unfreeze— resumePOST /policies/publish— publish new policy bundleGET /audit/{workflow_id}— retrieve audit trailGET /health— system health + incidentsWebSocket /ws/workflows/{id}— real-time task completion events
2. policy-engine (Port 8001)
Language: Python 3.14 (FastAPI)
Responsibility: Tool allowlist enforcement, task intent validation, schema registry, workload identity verification, prompt injection defense.
Dependencies:
- redis-cache (:6379) — tool allowlist cache, policy cache
- ledger-db (:5432) — schema registry, policy bundle history
Scaling: Horizontal (multiple replicas behind HTTP load balancer). Cache layer (Redis) absorbs most read load; direct DB fallback on cache miss.
Health Check:
- Endpoint:
GET /health - Response:
{ "status": "ok", "cache_hit_rate": 0.95, "db_latency_ms": 15 } - Timeout: 5s
- Interval: 10s
Startup Order: After redis-cache, ledger-db.
Environment Variables:
POLICY_ENGINE_HOST=0.0.0.0
POLICY_ENGINE_PORT=8001
POLICY_ENGINE_LOG_LEVEL=INFO
POLICY_ENGINE_CACHE_TTL_SECONDS=300
POLICY_ENGINE_DB_FALLBACK_TIMEOUT_MS=1000
LEDGER_DB_URL=postgresql://user:pass@ledger-db:5432/venture
REDIS_URL=redis://redis-cache:6379/0
POLICY_BUNDLE_REFRESH_INTERVAL_SECONDS=60
TOOL_ALLOWLIST_CACHE_KEY_PREFIX=policy:allowlist:Process-compose Config:
services:
policy-engine:
command: python -m uvicorn policy_engine.api:app --host 0.0.0.0 --port 8001 --workers 4
environment:
POLICY_ENGINE_PORT: 8001
POLICY_ENGINE_CACHE_TTL_SECONDS: 300
LEDGER_DB_URL: postgresql://...
REDIS_URL: redis://redis-cache:6379/0
depends_on:
- redis-cache
- ledger-dbEndpoints (representative):
POST /evaluate-intent— validate task intent against schema + tool allowlistPOST /check-tool-allowed— check if tool_name permitted for agent_roleGET /schemas/{task_type}— retrieve schema for task typeGET /policy-bundle/{version}— fetch policy bundle by versionPOST /schemas/register— register new task type schemaGET /health— health status + metrics
Inter-Service Communication:
- Called by: control-plane-api (validate intents), agent-runtime (check tool permissions), treasury-api (authorization checks)
- Protocol: HTTP REST (sync request/response)
3. artifact-compiler (Port 8002)
Language: Python 3.14 (FastMCP)
Responsibility: Artifact IR spec validation, deterministic build pipeline (validate → render → export), provenance attachment, non-deterministic provider handling.
Dependencies:
- ledger-db (:5432) — artifact_ir, artifact_builds, artifact_provenance tables
- event-bus (NATS, :4222) — emit artifact events
- redis-cache (:6379) — build result cache, idempotency tracking
- External APIs: Claude API, Veo/Banana (for rendering)
Scaling: Horizontal (queue-based dispatch via NATS work queue). Queue depth monitored; scale up if depth > 50.
Health Check:
- Endpoint:
GET /health - Response:
{ "status": "ok", "build_queue_depth": 3, "cache_hit_rate": 0.75 } - Timeout: 5s
- Interval: 10s
Startup Order: After ledger-db, event-bus, redis-cache.
Environment Variables:
ARTIFACT_COMPILER_HOST=0.0.0.0
ARTIFACT_COMPILER_PORT=8002
ARTIFACT_COMPILER_LOG_LEVEL=INFO
ARTIFACT_COMPILER_BUILD_TIMEOUT_SECONDS=300
ARTIFACT_COMPILER_CACHE_TTL_DAYS=7
LEDGER_DB_URL=postgresql://user:pass@ledger-db:5432/venture
NATS_URL=nats://event-bus:4222
REDIS_URL=redis://redis-cache:6379/0
CLAUDE_API_KEY=sk-...
VEO_API_KEY=...
BANANA_API_KEY=...
ARTIFACT_COMPILER_POLICY_TIER_DEFAULT=balancedProcess-compose Config:
services:
artifact-compiler:
command: python -m artifact_compiler.api:app --host 0.0.0.0 --port 8002
environment:
ARTIFACT_COMPILER_PORT: 8002
ARTIFACT_COMPILER_BUILD_TIMEOUT_SECONDS: 300
LEDGER_DB_URL: postgresql://...
NATS_URL: nats://event-bus:4222
REDIS_URL: redis://redis-cache:6379/0
CLAUDE_API_KEY: ${CLAUDE_API_KEY}
depends_on:
- ledger-db
- event-bus
- redis-cacheEndpoints (representative):
POST /artifacts/register-ir— register artifact IR specPOST /artifacts/{ir_id}/build— start artifact build pipelineGET /artifacts/{build_id}/status— poll build statusPOST /artifacts/{build_id}/replay— re-render artifact (cache bypass)GET /health— health status + queue metrics
Inter-Service Communication:
- Subscribes to: task completion events (async)
- Publishes to: event-bus (artifact.ir.registered, artifact.build.started, artifact.build.completed, artifact.provenance.attested)
4. treasury-api (Port 8003)
Language: Python 3.14 (FastAPI)
Responsibility: Authorization decisions, double-entry ledger, velocity controls, merchant/category enforcement, reconciliation.
Dependencies:
- ledger-db (:5432) — money_intents, authorization_decisions, ledger_entries tables
- event-bus (NATS, :4222) — emit money events, subscribe to task events
- redis-cache (:6379) — velocity control tracking, lock store
- policy-engine (:8001) — policy bundle validation
Scaling: 1-2 replicas (state: ledger + authorization queue). Serialize via distributed lock (Redis) or DB-level serialization.
Health Check:
- Endpoint:
GET /health - Response:
{ "status": "ok", "ledger_latency_ms": 20, "last_reconciliation": "..." } - Timeout: 5s
- Interval: 10s
Startup Order: After ledger-db, event-bus, redis-cache, policy-engine.
Environment Variables:
TREASURY_API_HOST=0.0.0.0
TREASURY_API_PORT=8003
TREASURY_API_LOG_LEVEL=INFO
TREASURY_API_AUTH_TIMEOUT_MS=500
LEDGER_DB_URL=postgresql://user:pass@ledger-db:5432/venture
NATS_URL=nats://event-bus:4222
REDIS_URL=redis://redis-cache:6379/0
POLICY_ENGINE_URL=http://policy-engine:8001
TREASURY_RECONCILIATION_CRON=0 0 * * * # daily at midnight
TREASURY_GLOBAL_DAILY_CAP_CENTS=1000000
TREASURY_VELOCITY_WINDOW_SECONDS=3600Process-compose Config:
services:
treasury-api:
command: python -m uvicorn treasury.api:app --host 0.0.0.0 --port 8003
environment:
TREASURY_API_PORT: 8003
TREASURY_API_AUTH_TIMEOUT_MS: 500
LEDGER_DB_URL: postgresql://...
NATS_URL: nats://event-bus:4222
REDIS_URL: redis://redis-cache:6379/0
depends_on:
- ledger-db
- event-bus
- redis-cache
- policy-engineEndpoints (representative):
POST /money/intents— create money intentPOST /money/authorize— authorize money intentGET /money/intents/{id}— fetch intent statusGET /ledger/{workflow_id}— retrieve ledger entries for workflowPOST /reconciliation/run— trigger reconciliation (admin)GET /health— health status
Inter-Service Communication:
- Calls: policy-engine (policy bundle fetch)
- Publishes to: event-bus (money.intent.created, money.authorization.decided, ledger.entry.created)
- Subscribes to: task completion events (settle ledger)
5. compliance-engine (Port 8004)
Language: Python 3.14 (FastAPI)
Responsibility: Policy rule evaluation, audit trail generation, violation detection + escalation, DSAR/deletion processing, incident classification.
Dependencies:
- ledger-db (:5432) — compliance_cases, privacy_requests, audit_trail tables
- event-bus (NATS, :4222) — subscribe to all events, emit compliance events
- redis-cache (:6379) — rule cache, incident state
Scaling: 2+ replicas (read-heavy). NATS consumer groups handle fair distribution of events.
Health Check:
- Endpoint:
GET /health - Response:
{ "status": "ok", "event_lag_ms": 100, "rule_cache_hit_rate": 0.88 } - Timeout: 5s
- Interval: 10s
Startup Order: After ledger-db, event-bus, redis-cache.
Environment Variables:
COMPLIANCE_ENGINE_HOST=0.0.0.0
COMPLIANCE_ENGINE_PORT=8004
COMPLIANCE_ENGINE_LOG_LEVEL=INFO
COMPLIANCE_ENGINE_RULE_CACHE_TTL_SECONDS=300
LEDGER_DB_URL=postgresql://user:pass@ledger-db:5432/venture
NATS_URL=nats://event-bus:4222
REDIS_URL=redis://redis-cache:6379/0
COMPLIANCE_ENGINE_VIOLATION_ESCALATION_URL=http://control-plane-api:8000/incidents
COMPLIANCE_ENGINE_AUTO_FREEZE_ON_CRITICAL=trueProcess-compose Config:
services:
compliance-engine:
command: python -m compliance.engine:app --host 0.0.0.0 --port 8004
environment:
COMPLIANCE_ENGINE_PORT: 8004
COMPLIANCE_ENGINE_RULE_CACHE_TTL_SECONDS: 300
LEDGER_DB_URL: postgresql://...
NATS_URL: nats://event-bus:4222
REDIS_URL: redis://redis-cache:6379/0
depends_on:
- ledger-db
- event-bus
- redis-cacheEndpoints (representative):
POST /evaluate— evaluate policy rule against actionGET /cases/{workflow_id}— retrieve compliance cases for workflowGET /audit/{workflow_id}— retrieve audit trailPOST /privacy/requests— submit DSAR/deletion requestGET /health— health status
Inter-Service Communication:
- Subscribes to: all event topics (policy., workflow., task., money., etc.)
- Publishes to: event-bus (compliance.evaluated, compliance.violation.detected, incident.classified)
- Calls: control-plane-api (escalate incidents)
6. venture-orchestrator (Port 8005)
Language: Python 3.14 (FastAPI)
Responsibility: Portfolio management, workstream DAG execution, L1/L2/L3 task dispatch, task queue management, monitoring/alerting.
Dependencies:
- ledger-db (:5432) — workflows, tasks, agent_actions tables
- event-bus (NATS, :4222) — publish task events, subscribe to task completion
- agent-runtime (Python, direct call for L1/L2)
- redis-cache (:6379) — task queue, agent slot tracking
Scaling: 2+ replicas. Distributed task dispatch via Redis-backed queue (fair distribution).
Health Check:
- Endpoint:
GET /health - Response:
{ "status": "ok", "queue_depth": 5, "agent_pool_utilization": 0.65 } - Timeout: 5s
- Interval: 10s
Startup Order: After ledger-db, event-bus, agent-runtime, redis-cache.
Environment Variables:
VENTURE_ORCHESTRATOR_HOST=0.0.0.0
VENTURE_ORCHESTRATOR_PORT=8005
VENTURE_ORCHESTRATOR_LOG_LEVEL=INFO
VENTURE_ORCHESTRATOR_TASK_QUEUE_MAX_DEPTH=100
VENTURE_ORCHESTRATOR_L3_POOL_MAX_CONCURRENCY=10
VENTURE_ORCHESTRATOR_L3_TASK_TIMEOUT_SECONDS=1800
LEDGER_DB_URL=postgresql://user:pass@ledger-db:5432/venture
NATS_URL=nats://event-bus:4222
REDIS_URL=redis://redis-cache:6379/0
AGENT_RUNTIME_URL=http://agent-runtime:5000 # for L1/L2 callsProcess-compose Config:
services:
venture-orchestrator:
command: python -m uvicorn venture.orchestrator:app --host 0.0.0.0 --port 8005 --workers 2
environment:
VENTURE_ORCHESTRATOR_PORT: 8005
VENTURE_ORCHESTRATOR_L3_POOL_MAX_CONCURRENCY: 10
LEDGER_DB_URL: postgresql://...
NATS_URL: nats://event-bus:4222
REDIS_URL: redis://redis-cache:6379/0
depends_on:
- ledger-db
- event-bus
- agent-runtime
- redis-cacheEndpoints (representative):
POST /dispatch/l1— dispatch L1 orchestrator taskPOST /dispatch/l2/{agent_type}— dispatch L2 specialist taskPOST /dispatch/l3— dispatch L3 copilot CLI taskGET /queue/depth— task queue depthGET /agents/slots— agent pool slot availabilityPOST /workflows/{id}/cancel— cancel workstreamGET /metrics— Prometheus metricsGET /health— health status
Inter-Service Communication:
- Calls: agent-runtime (L1/L2 task execution)
- Publishes to: event-bus (task.dispatch.initiated, task.completed)
- Subscribes to: task completion events (DAG progression)
7. agent-runtime (Port — internal, no exposed HTTP)
Language: Python 3.14
Responsibility: L1 orchestrator, L2 specialist agents (Writer, Coder, Researcher, Analyst), L3 copilot CLI worker pool management, tool call auditing, result capture.
Dependencies:
- policy-engine (:8001) — tool allowlist checks
- event-bus (NATS, :4222) — emit tool.call.executed, task.completed events
- ledger-db (:5432) — store results, agent_actions
- redis-cache (:6379) — agent state, slot tracking
Scaling: Horizontal. Each replica manages subset of L2 agents + L3 worker pool. Consistent hashing for L2 dispatch.
Health Check:
- Internal status: agent_runtime maintains /metrics (Prometheus) exposed via subsidiary HTTP service.
- Check:
GET http://agent-runtime:5000/metrics(if health sidecar enabled). - Metrics: active_agents, task_queue_depth, tool_calls_per_sec.
Startup Order: Before venture-orchestrator (orchestrator needs agent-runtime available).
Environment Variables:
AGENT_RUNTIME_LOG_LEVEL=INFO
AGENT_RUNTIME_L2_AGENT_COUNT=4 # Writer, Coder, Researcher, Analyst (1 each or more)
AGENT_RUNTIME_L3_MAX_CONCURRENCY=10
AGENT_RUNTIME_L3_TASK_TIMEOUT_SECONDS=1800
AGENT_RUNTIME_L3_WORKSPACE_DIR=/tmp/venture-l3-workspace
POLICY_ENGINE_URL=http://policy-engine:8001
NATS_URL=nats://event-bus:4222
LEDGER_DB_URL=postgresql://user:pass@ledger-db:5432/venture
REDIS_URL=redis://redis-cache:6379/0
COPILOT_PATH=/usr/local/bin/copilotProcess-compose Config:
services:
agent-runtime:
command: python -m agent_runtime.main
environment:
AGENT_RUNTIME_LOG_LEVEL: INFO
AGENT_RUNTIME_L3_MAX_CONCURRENCY: 10
AGENT_RUNTIME_L3_TASK_TIMEOUT_SECONDS: 1800
POLICY_ENGINE_URL: http://policy-engine:8001
NATS_URL: nats://event-bus:4222
LEDGER_DB_URL: postgresql://...
REDIS_URL: redis://redis-cache:6379/0
COPILOT_PATH: /usr/local/bin/copilot
depends_on:
- policy-engine
- event-bus
- ledger-db
- redis-cacheInter-Service Communication:
- Calls: policy-engine (tool allowlist checks)
- Publishes to: event-bus (tool.call.executed, task.completed)
- Queries: ledger-db (store results)
8. event-bus (NATS JetStream, Port 4222)
Language: Go/Rust (NATS binary)
Responsibility: Immutable event streams, topic fan-out, consumer group management, event replay, event persistence.
Dependencies:
- PostgreSQL (optional backing store for persistence, :5432)
Scaling: 1 node (single-node mode) or 3+ node cluster (multi-node HA). Peer replication within cluster.
Health Check:
- Endpoint (NATS monitoring):
GET http://event-bus:8222/healthz - Response:
{ "status": "ok", "connections": 10, "streams": 20 } - Timeout: 5s
- Interval: 10s
Startup Order: Before any service that publishes/subscribes.
Environment Variables (process-compose):
# NATS Server Config
NATS_PORT=4222
NATS_HTTP_PORT=8222
NATS_JETSTREAM=true
NATS_STORE_DIR=/data/nats # file-based persistence
NATS_LOG_LEVEL=infoProcess-compose Config:
services:
event-bus:
command: /usr/local/bin/nats-server -c /config/nats.conf
environment:
NATS_PORT: 4222
NATS_HTTP_PORT: 8222
NATS_JETSTREAM: "true"
NATS_STORE_DIR: /data/nats
ports:
- "4222:4222"
- "8222:8222"
volumes:
- ./nats.conf:/config/nats.conf:ro
- ./data/nats:/data/natsNATS Streams (topics):
policy.>— policy published, policy evaluatedworkflow.>— workflow created, started, completed, cancelledtask.>— task dispatched, running, completed, failedartifact.>— artifact IR registered, build started/completed, provenance attestedmoney.>— intent created, authorization decided, settledledger.>— entry created, materialized, reconciliation completedcompliance.>— evaluated, violation detected, case openedprivacy.>— request received, DSAR compiled, deletion completedaudit.>— checksum computed, integrity verified
Consumer Groups: Each consumer group (e.g., "ledger-db", "compliance-engine") consumes events independently; NATS tracks offset per group.
Inter-Service Communication:
- All services publish to event-bus; all services subscribe to relevant topics.
- Request/Reply pattern: some RPCs use NATS request/reply (e.g., policy evaluation).
9. ledger-db (PostgreSQL, Port 5432)
Language: SQL (PostgreSQL 15+)
Responsibility: Event store (append-only), read projections, state snapshots, schema registry, policy bundles, audit logs, checksum integrity.
Dependencies: None (foundational).
Scaling: 1 primary + 1-2 replicas (async streaming replication). Failover via pg_failover_slot or Patroni.
Health Check:
- Endpoint:
psql -h ledger-db -U venture -d venture -c "SELECT 1;" - Response:
(1 row) - Timeout: 5s
- Interval: 10s
Startup Order: First (foundational service).
Environment Variables:
POSTGRES_HOST=ledger-db
POSTGRES_PORT=5432
POSTGRES_DB=venture
POSTGRES_USER=venture
POSTGRES_PASSWORD=... # from secret
POSTGRES_LOG_LEVEL=warn
POSTGRES_MAX_CONNECTIONS=200
POSTGRES_SHARED_BUFFERS=256MB
POSTGRES_WAL_LEVEL=replica # for replication
POSTGRES_WAL_ARCHIVE_MODE=on
POSTGRES_WAL_ARCHIVE_COMMAND='...' # S3 archiveProcess-compose Config:
services:
ledger-db:
image: postgres:15-alpine
environment:
POSTGRES_DB: venture
POSTGRES_USER: venture
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRES_INITDB_ARGS: "-c max_connections=200 -c shared_buffers=256MB"
volumes:
- ./data/postgres:/var/lib/postgresql/data
- ./init-db.sql:/docker-entrypoint-initdb.d/init.sql:ro
ports:
- "5432:5432"
healthcheck:
test: ["CMD", "pg_isready", "-U", "venture"]
interval: 10s
timeout: 5s
retries: 5Core Tables:
events(append-only) — event_id, event_type, payload, checksum_prev, checksum_current, signature, created_atpolicy_bundles— id, version, content_hash, status, created_atworkflows— id, objective, policy_bundle_id, status, budget_cents, created_attasks— id, workflow_id, type, status, retries, created_atagent_actions— id, task_id, agent_role, tool, input_hash, output_hash, created_atmoney_intents— id, workflow_id, amount_cents, merchant_scope, status, created_atauthorization_decisions— id, money_intent_id, decision, reason_code, created_atledger_entries— id, workflow_id, entry_type, amount_cents, account_from, account_to, created_atcompliance_cases— id, workflow_id, rule_id, severity, status, created_atprivacy_requests— id, subject_ref, request_type, status, created_atschema_registry— id, task_type, schema_json, version, created_ataudit_checkpoints— batch_id, event_id_start, event_id_end, checksum, created_at
Inter-Service Communication: All services query/insert into ledger-db. No special protocol; standard PostgreSQL client libraries.
10. redis-cache (Redis, Port 6379)
Language: C (Redis binary)
Responsibility: Agent state cache, tool call rate limits, idempotency keys, session store, policy cache, hot data.
Dependencies: None (foundational).
Scaling: 1 node (with optional Sentinel for failover). Persistent RDB snapshots.
Health Check:
- Endpoint:
redis-cli ping - Response:
PONG - Timeout: 2s
- Interval: 10s
Startup Order: First (foundational service).
Environment Variables:
REDIS_HOST=redis-cache
REDIS_PORT=6379
REDIS_LOGLEVEL=notice
REDIS_PERSISTENCE_RDB_INTERVAL=60 # snapshot every 60s
REDIS_PERSISTENCE_RDB_SAVE_PATH=/data/redis/dump.rdbProcess-compose Config:
services:
redis-cache:
image: redis:7-alpine
command: redis-server --loglevel notice --dir /data --save 60 1000
volumes:
- ./data/redis:/data
ports:
- "6379:6379"
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 2s
retries: 5Key Namespaces (examples):
policy:allowlist:{agent_role}— tool allowlist (cached JSON)velocity-control:{workflow_id}:{merchant}:{category}— spend tracking (counter)global-spend:{YYYY-MM-DD}— daily spend total (counter)agent-slots:{agent_role}— available agent slots (counter)pool-utilization— L3 worker pool utilization (%)artifact-cache:{idempotency_key}— cached artifact outputsession:{session_id}— founder session state (expiring, TTL 24h)
Inter-Service Communication: Standard Redis client libraries (redis-py, node-redis, etc.). No special protocol.
Inter-Service Communication Map
control-plane-api (8000)
├─ calls ──→ policy-engine (8001) [validate intents]
├─ calls ──→ venture-orchestrator (8005) [dispatch tasks]
├─ queries ──→ ledger-db (5432) [workflows, audit]
└─ reads ──→ redis-cache (6379) [sessions, rate limits]
policy-engine (8001)
├─ reads ──→ redis-cache (6379) [policy cache, allowlists]
├─ queries ──→ ledger-db (5432) [schema registry, policies]
└─ (called by all services for tool checks)
artifact-compiler (8002)
├─ inserts ──→ ledger-db (5432) [artifact_ir, builds, provenance]
├─ publishes ──→ event-bus (4222) [artifact.* events]
├─ reads/writes ──→ redis-cache (6379) [build cache]
└─ calls ──→ external APIs [Claude, Veo, Banana]
treasury-api (8003)
├─ queries ──→ ledger-db (5432) [money_intents, authorization_decisions, ledger_entries]
├─ publishes ──→ event-bus (4222) [money.*, ledger.* events]
├─ reads/writes ──→ redis-cache (6379) [velocity control, locks]
└─ calls ──→ policy-engine (8001) [policy bundle validation]
compliance-engine (8004)
├─ inserts ──→ ledger-db (5432) [compliance_cases, audit_trail]
├─ subscribes ──→ event-bus (4222) [all event topics]
├─ publishes ──→ event-bus (4222) [compliance.* events]
├─ reads ──→ redis-cache (6379) [rule cache]
└─ calls ──→ control-plane-api (8000) [escalate incidents]
venture-orchestrator (8005)
├─ queries/inserts ──→ ledger-db (5432) [workflows, tasks]
├─ publishes/subscribes ──→ event-bus (4222) [task.* events]
├─ calls ──→ agent-runtime [L1/L2 dispatch]
└─ reads/writes ──→ redis-cache (6379) [task queue, agent slots]
agent-runtime
├─ calls ──→ policy-engine (8001) [tool allowlist checks, workload identity]
├─ publishes ──→ event-bus (4222) [tool.call.executed, task.completed]
├─ inserts ──→ ledger-db (5432) [agent_actions, results]
└─ reads/writes ──→ redis-cache (6379) [agent state, slot tracking]
event-bus (NATS, 4222)
└─ backed by ──→ ledger-db (5432) [event persistence (optional)]
ledger-db (PostgreSQL, 5432)
└─ queried by all services [append-only event store + projections]
redis-cache (Redis, 6379)
└─ accessed by all services [caching, rate limiting, state]Startup Order (process-compose)
- redis-cache (foundational)
- ledger-db (foundational)
- event-bus (NATS)
- policy-engine (depends on redis, ledger)
- agent-runtime (depends on policy, event-bus, ledger, redis)
- artifact-compiler (depends on ledger, event-bus, redis)
- treasury-api (depends on ledger, event-bus, redis, policy)
- compliance-engine (depends on ledger, event-bus, redis)
- venture-orchestrator (depends on ledger, event-bus, agent-runtime, redis)
- control-plane-api (depends on policy, orchestrator, ledger, redis)
Shutdown Order (reverse)
- control-plane-api (stop accepting new requests)
- venture-orchestrator (wait for in-flight tasks, cancel pending)
- compliance-engine (stop event consumption)
- treasury-api (settle pending intents)
- artifact-compiler (cancel pending builds)
- agent-runtime (kill in-flight L3 workers, wait for L2 completion)
- policy-engine (stop serving)
- event-bus (NATS flush and close)
- ledger-db (checkpoint, close)
- redis-cache (persist RDB, close)
Health Check Strategy
Each service exposes GET /health returning { "status": "ok"|"degraded"|"down", ... }.
Composite health (control-plane-api):
def check_system_health():
services = [
("policy-engine", call_health("http://policy-engine:8001/health")),
("artifact-compiler", call_health("http://artifact-compiler:8002/health")),
("treasury-api", call_health("http://treasury-api:8003/health")),
("compliance-engine", call_health("http://compliance-engine:8004/health")),
("venture-orchestrator", call_health("http://venture-orchestrator:8005/health")),
("event-bus", call_health("http://event-bus:8222/healthz")),
("ledger-db", check_postgres()),
("redis-cache", check_redis()),
]
all_ok = all(s[1] == "ok" for s in services)
system_status = "ok" if all_ok else ("degraded" if any(s[1] != "down" for s in services) else "down")
return { "status": system_status, "services": dict(services) }Founder queries via GET /health endpoint; triggers system freeze if status="down".
Monitoring & Metrics
Prometheus scrape targets:
- control-plane-api:8000/metrics
- policy-engine:8001/metrics
- artifact-compiler:8002/metrics
- treasury-api:8003/metrics
- compliance-engine:8004/metrics
- venture-orchestrator:8005/metrics
- redis_exporter:9121 (for Redis)
- postgres_exporter:9187 (for PostgreSQL)
- nats_exporter:7777 (for NATS)
Key metrics per service: see TECHNICAL_SPEC.md "Monitoring & Observability" section.
Revision History
| Date | Version | Author | Changes |
|---|---|---|---|
| 2026-02-21 | 1.0 | AI Agent | Complete service catalog: 10 services, health checks, environment contracts, inter-service communication map, startup/shutdown order. |