Metadata-Version: 2.4
Name: dispatch-kit
Version: 0.1.0
Summary: Pure, fail-closed cost-gating for expensive remote/external work: a hard $ budget cap, backend routing (local->cloud->SDK), and opt-in audited API egress.
Author-email: Aryan Falahatpisheh <aryanfalahat@gmail.com>
License: MIT
Keywords: budget,cost,cloud,gpu,llm,dispatch,egress,approval
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: flake8>=6.0; extra == "dev"
Requires-Dist: pylint>=3.0; extra == "dev"
Requires-Dist: ruff>=0.1; extra == "dev"

# dispatch-kit

A tiny, **pure, dependency-free** library for gating expensive remote/external work — the same
machinery for a **cloud GPU job** (a Cloud Run L4 reached over a tailnet) and a **paid LLM/SDK API
call** (Gemini, Claude, Rowan). It answers three questions, fail-closed:

- **Can we afford it?** — a hard, reserve-on-approval **budget cap** (per-run + per-month).
- **Where should it run?** — a pure **router**: `LOCAL → LAN → CLOUD → SDK`, SDK opt-in only.
- **Is the external call safe?** — opt-in, audited **API egress** with reference-only secrets.

It owns the *policy* (afford / route / approve / egress); your app keeps its job entity,
persistence, and executor. The transport *auth* (who may talk) is a separate concern — pair this
with [`tailnet-guard`](https://github.com/falahat/tailnet-guard). Stdlib only; every check is
fail-closed (default budget `0` = paid work off; SDK never auto-selected; a missing key refuses).

## Use

```python
from decimal import Decimal
from dispatch_kit import (
    BudgetCap, BudgetState, CostRates, admits, estimate_cost,   # the hard $ cap
    select_backend, BackendKind, ToolRequirements,              # the where
    SecretRef, ExternalEndpoint, log_egress,                    # opt-in API egress
    Approval,                                                   # the approval audit fact
)

# 1. Reserve-on-approval: refuse a job that would push past the cap (both windows).
rates = CostRates(gpu_usd_per_s=Decimal("0.0008"), vcpu_usd_per_s=Decimal("0.00001"),
                  gib_usd_per_s=Decimal("0.000002"), idle_tail_s=Decimal(600))
cost = estimate_cost(rates, max_runtime_s=3600, vcpus=8, memory_gib=32)   # an UPPER bound
decision = admits(cost, run_state, month_state, BudgetCap(run_usd=Decimal(50), month_usd=Decimal(500)))
if not decision.admitted:
    raise OverBudget(decision.reason)        # default cap is $0 — paid work is off until you set one

# 2. Pick where it runs — LOCAL first, SDK only if explicitly allowed.
backend = select_backend(my_backends, ToolRequirements(tool_id="cofold", min_vram_gb=24.0))

# 3. An LLM/SDK key is a REFERENCE (env var name), resolved at call time, never logged.
gemini = ExternalEndpoint("gemini", "https://generativelanguage.googleapis.com",
                          SecretRef("GEMINI_API_KEY"))
log_egress(gemini, detail="summarize")       # audit that data left the boundary
headers = {"Authorization": gemini.bearer()} # raises if the key is unset (never an unauth call)
```

## What's in the box

| Module | Purpose |
|---|---|
| `budget` | `BudgetCap` / `BudgetState` / `CostRates` / `admits` / `estimate_cost` — the hard, Decimal-exact, reserve-on-approval cap across a run + month window |
| `estimate` | `CostEstimate` / `HostCapabilities` / `vram_fits` — the one "no GPU ⇒ a GPU job is infeasible" rule, shared by the gate and the router |
| `routing` | `BackendKind` / `BackendCapabilities` / `ToolRequirements` / `select_backend` (generic over a `Routable`) — the pure `LOCAL→LAN→CLOUD→SDK` policy; SDK opt-in |
| `egress` | `SecretRef` / `ExternalEndpoint` / `log_egress` — reference-only API keys, https-only, fail-closed on a missing key, audited egress (SDKs **and** LLM APIs) |
| `approval` | `Approval` / `ApprovalOutcome` — the who/when/why audit fact for a gated job |
| `dispatch` | `JobStore` / `Transport` / `WorkerExecutor` protocols + `is_lease_stale` / `should_give_up` / `Lease` — the run-it-once-recoverably contract (atomic claim, stale-reject, lease recovery); push vs pull is only the `Transport` adapter |

## Notes

- **The budget cap lives in your dispatch service, never the UI** — an agent hitting the API
  directly is still gated. Default cap `$0`; if spend can't be computed, refuse.
- **Reserve on approval, reconcile on completion** — approving reserves the estimate immediately so
  a burst counts against the cap; the worker's true runtime reconciles `reserved → spent`.
- **SDK / external egress is the one deliberate exception** — never the default (`allow_sdk` /
  opt-in), always logged, the key sourced from a secret at call time and never written to a log.
