Metadata-Version: 2.4
Name: agentkavach
Version: 0.1.0
Summary: Hard budget limits and guardrails for LLM APIs (OpenAI, Anthropic, Google, Mistral).
Project-URL: Homepage, https://agentkavach.com
Project-URL: Documentation, https://agentkavach.com/docs
Project-URL: Repository, https://github.com/agentcostguard/agent-cost-guard
Project-URL: Issues, https://github.com/agentcostguard/agent-cost-guard/issues
Project-URL: Changelog, https://github.com/agentcostguard/agent-cost-guard/blob/main/CHANGELOG.md
Author-email: AgentKavach <agentcostguard@gmail.com>
Maintainer-email: AgentKavach <agentcostguard@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: anthropic,budget,cost-tracking,google,guardrails,llm,mistral,observability,openai,opentelemetry
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Monitoring
Classifier: Typing :: Typed
Requires-Python: >=3.9
Requires-Dist: httpx
Requires-Dist: openai
Requires-Dist: opentelemetry-api
Requires-Dist: opentelemetry-sdk
Requires-Dist: pyyaml
Requires-Dist: tiktoken
Provides-Extra: all
Requires-Dist: anthropic>=0.25; extra == 'all'
Requires-Dist: google-genai>=1.0; extra == 'all'
Requires-Dist: mistralai; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.25; extra == 'anthropic'
Provides-Extra: dev
Requires-Dist: build; extra == 'dev'
Requires-Dist: httpx; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-asyncio; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Requires-Dist: twine; extra == 'dev'
Provides-Extra: google
Requires-Dist: google-genai>=1.0; extra == 'google'
Provides-Extra: mistral
Requires-Dist: mistralai; extra == 'mistral'
Provides-Extra: server
Requires-Dist: alembic; extra == 'server'
Requires-Dist: bcrypt; extra == 'server'
Requires-Dist: fastapi>=0.100; extra == 'server'
Requires-Dist: passlib[bcrypt]; extra == 'server'
Requires-Dist: psycopg2-binary; extra == 'server'
Requires-Dist: pydantic-settings; extra == 'server'
Requires-Dist: python-jose[cryptography]; extra == 'server'
Requires-Dist: sqlalchemy>=2.0; extra == 'server'
Requires-Dist: stripe>=8.0; extra == 'server'
Requires-Dist: uvicorn; extra == 'server'
Description-Content-Type: text/markdown

# AgentKavach

Hard budget enforcement and guardrails for LLM APIs. Supports OpenAI, Anthropic, Google, and Mistral.

AgentKavach is a lightweight SDK wrapper that intercepts calls between your application and the LLM provider. Budget checks run in memory with approximately 0.1ms of overhead. When a budget limit is reached, kill callbacks execute in process to halt runaway spend before it escalates. Built-in guardrails enforce token limits, call count limits, runtime limits, and runaway loop detection. Telemetry is exported in the background using OpenTelemetry `gen_ai.*` semantic conventions, compatible out of the box with Grafana, Datadog, and any OTel collector.

**Status:** Pre-alpha. Not yet published to PyPI.

---

## Quick Start

```bash
pip install agentkavach
```

```python
from agentkavach import AgentKavach, Budget

def emergency_stop():
    agent.save_checkpoint("budget_exceeded")
    sys.exit(1)

guard = AgentKavach(
    provider="openai",              # "openai", "anthropic", "google", or "mistral"
    api_key="ak_prod_...",          # your AgentKavach API key (prefix routes to the right backend)
    llm_key="sk-...",               # your LLM provider API key
    agent_name="research-bot",
    budget=Budget.daily(50),        # $50/day hard limit
    on_kill=emergency_stop,         # invoked at 100% utilization
    save_prompts=False,             # opt-in: save prompt text to dashboard
)

response = guard.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)
```

### Dashboard Stop Button

The dashboard **Stop** button triggers the SDK's kill switch for an agent. When clicked, a "kill" alert event is recorded and subsequent `guard.create()` calls raise `BudgetExceededError`. However, the Stop button does **not** directly terminate a running process — it sets the killed state that the SDK checks on the next LLM call. If no `on_kill` callback is defined, the process continues running until it attempts another LLM call (which will then fail). Always define an `on_kill` callback to ensure graceful shutdown.

### Passthrough Mode

If the AgentKavach API key is empty or invalid, the SDK operates in **passthrough mode** — LLM calls go directly to the provider with zero overhead. No budget checks, no telemetry, no request/response modification. Only the API key is validated.

### Prompt Logging

By default, AgentKavach does **not** capture or store prompt text (`save_prompts=False`). This protects sensitive data. To enable prompt logging for debugging and audit:

```python
guard = AgentKavach(
    save_prompts=True,  # prompts will appear in the dashboard events table
    ...
)
```

When disabled, the dashboard shows *"Prompt logging disabled"* in the events table.

---

## Slack Alerts

Get budget alerts in Slack when your agents approach or exceed their limits. Takes about 2 minutes to set up.

### 1. Create a Slack App with an Incoming Webhook

1. Go to [api.slack.com/apps](https://api.slack.com/apps) and click **Create New App**.
2. Choose **From scratch**.
3. Set the **App Name** to `AgentKavach Alerts` (or any name) and pick your workspace. Click **Create App**.
4. In the left sidebar, click **Incoming Webhooks**.
5. Toggle **Activate Incoming Webhooks** to **On**.
6. Click **Add New Webhook to Workspace**.
7. Select the channel for alerts (e.g. `#agentkavach-alerts`) and click **Allow**.
8. Copy the **Webhook URL**. It looks like this:

```
https://hooks.slack.com/services/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
```

The URL has three segments after `/services/`:

| Segment | Example | Meaning |
|---------|---------|---------|
| 1st | `T024BE7LD` | Your workspace ID (starts with `T`) |
| 2nd | `B08N4CJQR7` | The webhook integration ID (starts with `B`) |
| 3rd | `8bXMkCAMYKiEFXqSEaLgNjRh` | Secret token (24 chars — treat like a password) |

> **Keep this URL secret.** Anyone with it can post to your channel. Store it in `.env`, never in source code.

### 2. Add the webhook URL to your environment

```bash
# .env (gitignored — never committed)
AGENTKAVACH_SLACK_WEBHOOK_URL=https://hooks.slack.com/services/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
```

That's it — AgentKavach auto-detects the env var. Slack alerts are enabled for all agents that include `slack` in their alert channels.

### 3. Use it in your code

**Option A — Environment variable (simplest, zero code):**

```python
from agentkavach import AgentKavach, Budget

# AGENTKAVACH_SLACK_WEBHOOK_URL is set → Slack alerts auto-enabled
guard = AgentKavach(
    agent_name="my-agent",
    api_key="ak_prod_abc123def456",
    llm_key="sk-proj-abc123def456ghi789jkl012mno345",
    budget=Budget.daily(50),
)
```

**Option B — Direct parameter:**

```python
guard = AgentKavach(
    agent_name="my-agent",
    api_key="ak_prod_abc123def456",
    llm_key="sk-proj-abc123def456ghi789jkl012mno345",
    budget=Budget.daily(50),
    slack_webhook_url="https://hooks.slack.com/services/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",
)
```

**Option C — Channels API (per-threshold control):**

```python
import os
from agentkavach import AgentKavach, Budget

SLACK_URL = os.environ["AGENTKAVACH_SLACK_WEBHOOK_URL"]

guard = AgentKavach(
    agent_name="my-agent",
    api_key="ak_prod_abc123def456",
    llm_key="sk-proj-abc123def456ghi789jkl012mno345",
    budget=Budget.daily(50),
    channels=[
        AgentKavach.channel("slack", threshold=0.70, webhook_url=SLACK_URL),
        AgentKavach.channel("slack", threshold=0.90, webhook_url=SLACK_URL),
        AgentKavach.channel("kill",  threshold=1.0),
    ],
)
```

**Option D — YAML config:**

```yaml
# config.yaml
channels:
  slack:
    type: slack
    webhook_url: ${AGENTKAVACH_SLACK_WEBHOOK_URL}

agents:
  research-bot:
    provider: openai
    budget: { type: daily, limit: 50 }
    alerts:
      - { threshold: 0.50, channels: [slack] }
      - { threshold: 0.80, channels: [slack] }
      - { threshold: 1.0,  channels: [slack, kill] }
```

```python
from agentkavach import AgentKavach
clients = AgentKavach.from_yaml("config.yaml")
guard = clients["research-bot"]
```

### What the alert looks like

The default message includes a header with severity level, spend details, and a dashboard link:

```
┌───────────────────────────────────────────────────┐
│  [WARN] research-bot — 70% of Cost limit          │
│                                                   │
│  Type:       Cost                                 │
│  Current:    $35.00 / $50.00                      │
│  Remaining:  $15.00                               │
│  Period:     daily (resets midnight UTC)           │
│                                                   │
│  [ View Dashboard → ]                             │
└───────────────────────────────────────────────────┘
```

### Troubleshooting

| Problem | Fix |
|---------|-----|
| No message appears | Verify `AGENTKAVACH_SLACK_WEBHOOK_URL` is set and alerts include `[slack]` |
| `channel_not_found` | The channel was deleted — add a new webhook in your Slack App |
| No formatting (plain text only) | You're using a legacy webhook — create a Slack App webhook instead |
| `invalid_payload` / `400` | Custom template missing a `text` key |
| `403 Forbidden` | Webhook revoked — reinstall the app and generate a new URL |

---

## PagerDuty Alerts

Get PagerDuty incidents when your agents approach or exceed their budgets. AgentKavach uses the **Events API v2** to trigger alerts with the correct severity (warning / error / critical) based on how close the agent is to its limit. Takes about 3 minutes to set up.

### 1. Create a PagerDuty Service with Events API v2

1. Go to [app.pagerduty.com](https://app.pagerduty.com) and sign in. (Free 14-day trial available — no credit card required.)
2. In the top nav, click **Services** → **Service Directory**.
3. Click **+ New Service** (top right).
4. Set the **Name** to `AgentKavach Budget Alerts` (or any name you like).
5. Under **Assign an Escalation Policy**, pick your team's escalation policy (or use the default). Click **Next**.
6. On the **Integrations** step, search for **Events API v2** and select it. Click **Create Service**.
7. PagerDuty shows the integration details page. Copy the **Integration Key**. It looks like this:

```
a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6
```

This 32-character hex string is your **routing key**. PagerDuty calls it "Integration Key" in the UI and `routing_key` in the API — they are the exact same thing.

> **Already have a Service?** Go to **Services** → click your service → **Integrations** tab → **+ Add an integration** → search for **Events API v2** → **Confirm Integration** → expand the row → copy the **Integration Key**.

> **Keep this key secret.** Anyone with it can trigger incidents on your service. Store it in `.env`, never in source code.

### 2. Add the routing key to your environment

```bash
# .env (gitignored — never committed)
AGENTKAVACH_PAGERDUTY_ROUTING_KEY=a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6
```

That's it — AgentKavach auto-detects the env var. PagerDuty alerts are enabled for all agents that include `pagerduty` in their alert channels.

### 3. Use it in your code

**Option A — Environment variable (simplest, zero code):**

```python
from agentkavach import AgentKavach, Budget

# AGENTKAVACH_PAGERDUTY_ROUTING_KEY is set → PagerDuty alerts auto-enabled
guard = AgentKavach(
    agent_name="my-agent",
    api_key="ak_prod_abc123def456",
    llm_key="sk-proj-abc123def456ghi789jkl012mno345",
    budget=Budget.daily(50),
)
```

**Option B — Direct parameter:**

```python
guard = AgentKavach(
    agent_name="my-agent",
    api_key="ak_prod_abc123def456",
    llm_key="sk-proj-abc123def456ghi789jkl012mno345",
    budget=Budget.daily(50),
    pagerduty_routing_key="a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6",
)
```

**Option C — Channels API (tiered alerting):**

```python
import os
from agentkavach import AgentKavach, Budget

PD_KEY = os.environ["AGENTKAVACH_PAGERDUTY_ROUTING_KEY"]
SLACK_URL = os.environ["AGENTKAVACH_SLACK_WEBHOOK_URL"]

guard = AgentKavach(
    agent_name="my-agent",
    api_key="ak_prod_abc123def456",
    llm_key="sk-proj-abc123def456ghi789jkl012mno345",
    budget=Budget.daily(50),
    channels=[
        AgentKavach.channel("slack",     threshold=0.70, webhook_url=SLACK_URL),  # early warning
        AgentKavach.channel("slack",     threshold=0.90, webhook_url=SLACK_URL),  # getting close
        AgentKavach.channel("pagerduty", threshold=0.90, routing_key=PD_KEY),     # page on-call
        AgentKavach.channel("pagerduty", threshold=1.0,  routing_key=PD_KEY),     # critical
        AgentKavach.channel("kill",      threshold=1.0),                          # hard stop
    ],
)
```

**Option D — YAML config:**

```yaml
# config.yaml
channels:
  pagerduty:
    type: pagerduty
    routing_key: ${AGENTKAVACH_PAGERDUTY_ROUTING_KEY}

  slack:
    type: slack
    webhook_url: ${AGENTKAVACH_SLACK_WEBHOOK_URL}

agents:
  research-bot:
    provider: openai
    budget: { type: daily, limit: 50 }
    alerts:
      - { threshold: 0.70, channels: [slack] }                 # Slack at 70%
      - { threshold: 0.90, channels: [slack, pagerduty] }      # Slack + PagerDuty at 90%
      - { threshold: 1.0,  channels: [slack, pagerduty, kill] } # everything at 100%
```

```python
from agentkavach import AgentKavach
clients = AgentKavach.from_yaml("config.yaml")
guard = clients["research-bot"]
```

### Custom PagerDuty template

Override the default payload. The template follows the [Events API v2 schema](https://developer.pagerduty.com/api-reference/a7d81b0e9200f-send-an-event-to-pager-duty). You never need to put `routing_key` in the template — AgentKavach injects it automatically from your config.

```python
custom_template = {
    "event_action": "trigger",
    "payload": {
        "summary": "Budget Alert: {agent_name} at {pct}% of {budget_fmt} ({budget_type})",
        "severity": "{severity}",
        "source": "agentkavach",
        "component": "{agent_name}",
        "group": "budget-alerts",
        "class": "{budget_type}",
        "custom_details": {
            "agent": "{agent_name}",
            "type": "{budget_type}",
            "current": "{spent_fmt}",
            "limit": "{budget_fmt}",
            "remaining": "{remaining_fmt}",
            "period": "{period}",
            "resets_at": "{resets_at}",
            "dashboard": "{dashboard_url}",
        },
    },
}

guard = AgentKavach(
    ...,
    channels=[
        AgentKavach.channel("pagerduty", threshold=0.90,
            routing_key=os.environ["AGENTKAVACH_PAGERDUTY_ROUTING_KEY"],
            template=custom_template),
    ],
)
```

YAML custom template:

```yaml
channels:
  pagerduty:
    type: pagerduty
    routing_key: ${AGENTKAVACH_PAGERDUTY_ROUTING_KEY}
    template:
      event_action: trigger
      payload:
        summary: "Budget Alert: {agent_name} at {pct}% ({budget_type})"
        severity: "{severity}"
        source: agentkavach
        custom_details:
          agent: "{agent_name}"
          spent: "{spent_fmt}"
          limit: "{budget_fmt}"
          period: "{period}"
```

### Severity mapping

| Threshold | PagerDuty Severity | Behavior |
|-----------|-------------------|----------|
| Below 90% | `warning` | Low-urgency incident — no phone call |
| 90% – 99% | `error` | High-urgency — phone calls + SMS per escalation policy |
| 100% | `critical` | High-urgency — immediate attention |

### What the alert looks like

AgentKavach creates a PagerDuty incident with:

| Field | Example |
|-------|---------|
| **Summary** | AgentKavach: research-bot at 90% — $45.00/$50.00 (Cost) |
| **Severity** | error |
| **Source** | agentkavach |
| **Custom details** | Agent name, budget type, current spend, limit, period, remaining |

### Troubleshooting

| Problem | Fix |
|---------|-----|
| No incident in PagerDuty | Verify `AGENTKAVACH_PAGERDUTY_ROUTING_KEY` is set and alerts include `[pagerduty]` |
| `400 Bad Request` | Custom template missing required fields (`event_action`, `payload.summary`, `payload.severity`, `payload.source`) |
| `routing_key_invalid` | Key doesn't match any integration — copy it again from PagerDuty service → Integrations tab |
| Incident created but nobody notified | Check your escalation policy — ensure on-call schedules are populated and notification rules are set |
| Incidents resolve immediately | Auto-resolve enabled on the service — increase or disable the timeout in service settings |
| Duplicate incidents for same threshold | Expected after cooldown (default: 5 min). Increase cooldown or acknowledge the first incident |

---

## Frontend Error Tracking (Optional)

The dashboard ships with [Sentry](https://sentry.io) wired into the
client, server, and edge runtimes (`@sentry/nextjs`). Initialisation is
**fully gated on environment variables** — if no DSN is configured, the
SDK is a no-op and no telemetry leaves the browser. Self-hosters and
local developers see zero behaviour change.

### Enable on Vercel

Set the following in your Vercel project (Settings → Environment
Variables). Apply to **Preview** and **Production**:

| Variable | Where | Example | Notes |
|----------|-------|---------|-------|
| `NEXT_PUBLIC_SENTRY_DSN` | Runtime (Vercel) | `https://abc@o123.ingest.sentry.io/456` | Public DSN from your Sentry project. Unset = Sentry disabled. |
| `NEXT_PUBLIC_SENTRY_TRACES_SAMPLE_RATE` | Runtime (Vercel) | `0.1` | Fraction of transactions sampled. Defaults to `0.1` (10 %). Cost control. |
| `SENTRY_AUTH_TOKEN` | **Build only** (Vercel) | `sntrys_...` | Used during `next build` to upload source maps. Never expose to clients. |
| `SENTRY_ORG` | Build only | `agentkavach` | Sentry org slug for source-map uploads. |
| `SENTRY_PROJECT` | Build only | `dashboard` | Sentry project slug for source-map uploads. |

### What gets captured

- Uncaught browser exceptions, unhandled promise rejections, React
  render errors (via the Next.js error boundary).
- Server-side exceptions thrown inside route handlers, API routes, and
  server components.
- Edge runtime exceptions (middleware).

### Privacy / PII

- **Session replay is OFF by default** (`replaysSessionSampleRate=0`).
  We do not want to record customer screens. Enable explicitly via
  `NEXT_PUBLIC_SENTRY_REPLAYS_ON_ERROR_SAMPLE_RATE` only if you have
  reviewed the implications.
- Cookie values (`ak_session`, `ak_csrf`, etc.) are stripped from all
  events and breadcrumbs before send.
- Any field named `prompt`, `messages`, `completion`, `prompt_text`, or
  `response_text` is replaced with `"[scrubbed]"` recursively, so a
  customer prompt never leaves the browser even if a buggy component
  attaches it to an error context.

See `dashboard/lib/sentry-scrub.ts` and the Sentry config files
(`dashboard/sentry.{client,server,edge}.config.ts`) for the full rules.

---

## Documentation

Full documentation is available on the website:

| Topic | Link |
|-------|------|
| **Quickstart** | [/public/docs/quickstart](https://agentkavach.com/public/docs/quickstart) |
| **Providers** (OpenAI, Anthropic, Google, Mistral) | [/public/docs/providers](https://agentkavach.com/public/docs/providers) |
| **Budgets** (daily, monthly, total) | [/public/docs/budgets](https://agentkavach.com/public/docs/budgets) |
| **Shared Budgets** | [/public/docs/shared-budgets](https://agentkavach.com/public/docs/shared-budgets) |
| **Guardrails** (tokens, calls, runtime, loops) | [/public/docs/guardrails](https://agentkavach.com/public/docs/guardrails) |
| **Alerts & Channels** (Email, Slack, PagerDuty, Webhook) | [/public/docs/alerts](https://agentkavach.com/public/docs/alerts) |
| **YAML Configuration** | [/public/docs/yaml-config](https://agentkavach.com/public/docs/yaml-config) |
| **API Reference** | [/public/docs/api-reference](https://agentkavach.com/public/docs/api-reference) |
| **Error Codes** | [/public/docs/error-codes](https://agentkavach.com/public/docs/error-codes) |
| **Model Pricing** | [/public/docs/pricing-models](https://agentkavach.com/public/docs/pricing-models) |
| **Streaming** | [/public/docs/streaming](https://agentkavach.com/public/docs/streaming) |
| **Releases** | [/public/docs/changelog](https://agentkavach.com/public/docs/changelog) |

---

## Development Setup

### Prerequisites

| Requirement       | Version   |
|-------------------|-----------|
| Python            | 3.9+      |
| Node.js           | 18+       |
| npm               | 9+        |
| Docker (optional) | 24+       |

### Full Stack with Docker (Recommended)

```bash
cp .env.example .env    # Edit with your values
docker compose up --build
```

| Service     | URL                    |
|-------------|------------------------|
| Dashboard   | http://localhost:3000   |
| Backend API | http://localhost:8000   |
| TimescaleDB | localhost:5432          |
| Redis       | localhost:6379          |
| Kafka       | localhost:9092          |

### Without Docker (SQLite fallback)

```bash
# Terminal 1: Backend
pip install -e ".[dev,server]"
uvicorn server.app:create_app --factory --reload --port 8000

# Terminal 2: Dashboard
cd dashboard && npm install && npm run dev
```

---

## Architecture

```
agentkavach/             # Python SDK
  client.py               # AgentKavach class, channel(), YAML loader
  budget.py               # Budget types: daily, monthly, total, shared
  engine.py               # In-memory spend tracking and threshold checks
  alerts.py               # AlertRule, ChannelConfig, dispatcher
  channels/               # Alert channel handlers (slack, email, pagerduty, webhook)
  stream.py               # Streaming wrapper with GeneratorExit handling
  buffer.py               # Event buffer (disk, tmpdir, memory)
  sender.py               # OTel exporter
  pricing.py              # Model price tables
  exceptions.py           # BudgetExceededError, GuardrailError, etc.
  providers/              # Provider response parsers (openai, anthropic, google, mistral)

server/                  # Backend API (FastAPI)
  app.py                  # App factory and lifespan
  auth.py                 # Register, login, JWT
  keys.py                 # API key CRUD
  agents.py               # Agent list and detail
  dashboard.py            # Spend overview and stats
  billing.py              # Stripe checkout and webhooks
  ingest.py               # Event ingestion (Kafka or direct)
  runs.py                 # Run tracking endpoints
  models.py               # SQLAlchemy models
  database.py             # Engine and TimescaleDB init
  config.py               # Settings (pydantic-settings)

dashboard/               # Next.js frontend
  app/                    # Pages (landing, auth, dashboard, docs)
  components/             # UI components
  lib/api.ts              # API client

tests/                   # Backend pytest suite
docs/                    # Internal engineering docs
```

---

## Running Tests

```bash
# Backend
python3 -m pytest --tb=short -q

# Frontend
cd dashboard && npm test

# Lint and format (backend)
python3 -m ruff check . && python3 -m ruff format --check .

# Lint and format (frontend)
cd dashboard && npx next lint
npx prettier --check "**/*.{ts,tsx,json,css}" --ignore-path .gitignore
```

---

## Infrastructure & Services

All external services used by AgentKavach.

### Core Infrastructure

| # | Service | Provider | What it does | Dashboard |
|---|---------|----------|-------------|-----------|
| 1 | **Backend API** | [Render](https://render.com) | Hosts the FastAPI backend — event ingestion, auth, budget checks, alert worker. All API traffic from the SDK and dashboard hits this. | [dashboard.render.com](https://dashboard.render.com) |
| 2 | **Frontend** | [Vercel](https://vercel.com) | Hosts the Next.js dashboard. Serves the landing page, docs, auth pages, and the logged-in dashboard. Proxies `/api/*` requests to the Render backend via `next.config.ts` rewrites. | [vercel.com/dashboard](https://vercel.com/dashboard) |
| 3 | **Database** | [Timescale Cloud](https://console.cloud.timescale.com) | PostgreSQL with TimescaleDB extension. Stores all events in a hypertable (partitioned by time, compressed after 3 days, 30-day retention). Continuous aggregates roll up hourly and daily spend. Also stores users, orgs, budgets, alert rules, API keys, runs, and subscriptions. | [console.cloud.timescale.com](https://console.cloud.timescale.com) |
| 4 | **Redis** | [Upstash](https://console.upstash.com) | Serverless Redis. Used for spend accumulators (real-time budget tracking), rate limiting (daily + burst), spend velocity windows, dashboard query cache, and agent count tracking. All keys are prefixed and TTL'd. | [console.upstash.com](https://console.upstash.com) |
| 5 | **Kafka** | [Confluent Cloud](https://confluent.cloud) | Event streaming. The SDK sends usage events to the `agentkavach.events` topic (16 partitions, keyed by org_id). The backend's EventWriter consumes and batch-writes to TimescaleDB. Failed events go to `agentkavach.events.dlq`. Alert summaries go to `agentkavach.alerts`. | [confluent.cloud](https://confluent.cloud) |

### Alert Channels

| # | Service | Provider | What it does | Dashboard |
|---|---------|----------|-------------|-----------|
| 6 | **Slack Alerts** | [Slack](https://api.slack.com/apps) | Budget threshold alerts posted to a Slack channel via Incoming Webhooks. Sends Block Kit messages (header + spend details + dashboard link) with a plain-text fallback for legacy webhooks. Configured via `AGENTKAVACH_SLACK_WEBHOOK_URL` env var or YAML. | [api.slack.com/apps](https://api.slack.com/apps) |
| 7 | **PagerDuty Alerts** | [PagerDuty](https://app.pagerduty.com) | Budget threshold alerts that create PagerDuty incidents via Events API v2. Severity auto-maps from threshold (70% → warning, 90% → error, 100% → critical). Configured via `AGENTKAVACH_PAGERDUTY_ROUTING_KEY` env var or YAML. | [app.pagerduty.com](https://app.pagerduty.com) |
| 8 | **Transactional Email** | [Resend](https://resend.com) | Sends OTP verification emails (registration + forgot password), budget threshold alert emails, and contact-us form submissions. Uses Resend's API with the `agentkavach.com` domain. Configured via `AGENTKAVACH_RESEND_API_KEY`. | [resend.com/overview](https://resend.com/overview) |
| 9 | **Email Forwarding** | [ImprovMX](https://improvmx.com) | Forwards emails sent to `admin@agentkavach.com`, `support@agentkavach.com`, and `*@agentkavach.com` to `agentcostguard@gmail.com`. Required because Resend handles outbound email only — ImprovMX handles inbound (reply-to, support requests, etc.). | [improvmx.com](https://improvmx.com) |

### Other Services

| # | Service | Provider | What it does | Dashboard |
|---|---------|----------|-------------|-----------|
| 10 | **CI/CD** | [GitHub Actions](https://github.com/agentcostguard/agentcostguard/actions) | Runs lint, format, test, and security checks on every push. | [GitHub Actions](https://github.com/agentcostguard/agentcostguard/actions) |
| 11 | **Payments** | [Stripe](https://dashboard.stripe.com) | Subscription billing (Free/Pro/Max tiers). Handles checkout, portal, and webhooks. | [dashboard.stripe.com](https://dashboard.stripe.com) |
| 12 | **Package** | [PyPI](https://pypi.org) | SDK distribution (`pip install agentkavach`). | [pypi.org](https://pypi.org) |

### Kafka Topics

| Topic | Partitions | Retention | Purpose |
|-------|------------|-----------|---------|
| `agentkavach.events` | 16 | 7 days | Ingest events (keyed by org_id) |
| `agentkavach.events.dlq` | 3 | 7 days | Dead letter queue for failed events |
| `agentkavach.alerts` | 3 | 7 days | Budget threshold alert summaries |

### TimescaleDB Tables

| Table | Type | Notes |
|-------|------|-------|
| `events` | Hypertable | Partitioned by timestamp, compressed after 3 days, 30-day auto-retention |
| `hourly_spend` | Continuous aggregate | Hourly cost rollups per org/agent |
| `daily_spend` | Continuous aggregate | Daily cost rollups per org/agent |
| `organizations` | Regular | Org profiles, tiers |
| `budgets` | Regular | Budget configs per org/agent |
| `users` | Regular | User accounts |
| `api_keys` | Regular | SDK API keys (hashed, never stored raw) |
| `alert_rules` | Regular | Alert threshold configs |
| `alert_events` | Regular | Alert history (auto-pruned >30 days) |
| `runs` | Regular | LLM call run tracking |
| `subscriptions` | Regular | Stripe subscription state |

### Redis Keys

| Pattern | TTL | Purpose |
|---------|-----|---------|
| `spend:{org_id}:{agent}:{period}` | 31 days | Spend accumulators (batched HINCRBYFLOAT) |
| `rl:{org_id}:daily` | 24h | Daily rate limit counter |
| `rl:{org_id}:burst` | 60s | Burst rate limit counter |
| `velocity:{org_id}:{agent}:{window}` | 1h | Spend velocity windows |
| `dash:{org_id}:{endpoint}:{hash}` | 15–30s | Dashboard query cache |
| `agents:{org_id}` | 24h | Agent count sorted set |

### Cost Scaling Estimates

| Customers | Redis | Kafka | TimescaleDB | Render | Vercel | **Total** |
|-----------|-------|-------|-------------|--------|--------|-----------|
| 0 (dev) | $0 | $0 | $0 | $0 | $0 | **$0** |
| 100 | ~$5 | ~$10 | ~$30 | $7 | $0 | **~$52** |
| 1,000 | ~$15 | ~$25 | ~$50 | $25 | $0 | **~$115** |
| 10,000 | ~$30 | ~$70 | ~$100 | $85 | $20 | **~$305** |
| 100,000 | ~$50 | ~$150 | ~$200 | $175 | $20 | **~$595** |

### Local Docker Stack

The local `docker-compose.local.yml` mirrors Render Starter exactly (0.5 CPU, 512MB RAM, 1 worker):

| Service | Port | Purpose |
|---------|------|---------|
| Backend | :8000 | FastAPI + EventWriter + AlertWorker (single container) |
| Dashboard | :3000 | Next.js frontend |
| TimescaleDB | :5432 | PostgreSQL 16 + TimescaleDB |
| Redis | :6379 | Redis 7 Alpine |
| Kafka | :9092 | Confluent Kafka (KRaft mode) |

See [docs/infrastructure.md](docs/infrastructure.md) for detailed architecture.

---

## Environment Variables

| Variable | Required | Description |
|----------|----------|-------------|
| `AGENTKAVACH_DATABASE_URL` | Backend | PostgreSQL connection string (defaults to SQLite) |
| `AGENTKAVACH_REDIS_URL` | Optional | Redis URL for rate limiting |
| `AGENTKAVACH_KAFKA_BOOTSTRAP_SERVERS` | Optional | Kafka broker address |
| `AGENTKAVACH_JWT_SECRET` | Backend | JWT signing secret |
| `AGENTKAVACH_SLACK_WEBHOOK_URL` | Optional | Slack Incoming Webhook URL |
| `AGENTKAVACH_ALERT_EMAIL` | Optional | Alert recipient email |
| `AGENTKAVACH_PD_ROUTING_KEY` | Optional | PagerDuty routing key |
| `RESEND_API_KEY` | Optional | Resend API key for email alerts |
| `OPENAI_API_KEY` | SDK | OpenAI API key |
| `ANTHROPIC_API_KEY` | SDK | Anthropic API key |
| `GOOGLE_API_KEY` | SDK | Google AI API key |
| `MISTRAL_API_KEY` | SDK | Mistral API key |

---

## API Endpoints

| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/v1/status` | Health check |
| POST | `/v1/auth/register` | Create account |
| POST | `/v1/auth/login` | Login, get JWT |
| GET | `/v1/agents` | List agents |
| GET | `/v1/agents/:name` | Agent detail |
| GET | `/v1/keys` | List API keys |
| POST | `/v1/keys` | Create API key |
| DELETE | `/v1/keys/:id` | Revoke API key |
| POST | `/v1/keys/:id/rotate` | Rotate API key |
| GET | `/v1/dashboard/overview` | Spend overview |
| GET | `/v1/dashboard/stats` | Organization stats |
| GET | `/v1/billing/plan` | Current plan |
| POST | `/v1/billing/checkout` | Upgrade plan |
| POST | `/v1/billing/portal` | Billing portal session |
| GET | `/v1/agents/:name/runs` | List agent runs |
| GET | `/v1/runs/:id` | Run detail |
| GET | `/v1/runs/:id/events` | Run event timeline |
| POST | `/v1/ingest` | Receive SDK events |
| GET | `/v1/pricing` | Model pricing (public) |

See [API Reference](https://agentkavach.com/public/docs/api-reference) for full request/response documentation.

---

## Internal Engineering Docs

| Document | Description |
|----------|-------------|
| [docs/infrastructure.md](docs/infrastructure.md) | Docker, Redis, Kafka, TimescaleDB |
| [docs/rate-limiting.md](docs/rate-limiting.md) | Rate limiter implementation |
| [docs/event-pipeline.md](docs/event-pipeline.md) | SDK → OTel → Kafka → DB flow |
| [docs/alert-system.md](docs/alert-system.md) | Alert channels, cooldowns, dedup |
| [docs/deployment.md](docs/deployment.md) | Production deployment guide |
| [docs/seed-data.md](docs/seed-data.md) | Seed script and test credentials |
| [docs/testing.md](docs/testing.md) | Test organization and fixtures |

---

## Contributing

1. Never push directly to `main`
2. Create a feature branch: `phase{N}/{descriptive-name}`
3. Each PR must be self-contained with tests
4. Run all checks before pushing:
   ```bash
   python3 -m pytest --tb=short -q
   python3 -m ruff check . && python3 -m ruff format --check .
   cd dashboard && npm test && npx next lint
   npx prettier --check "**/*.{ts,tsx,json,css}" --ignore-path .gitignore
   ```
5. Create PR via `gh pr create`, merge via `gh pr merge`

---

## Versioning

AgentKavach follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html)
(`MAJOR.MINOR.PATCH`). The current version is **0.1.0**.

- **Single source of truth.** The SDK version lives in
  [`pyproject.toml`](pyproject.toml) and is re-exported as
  `agentkavach.__version__`. Both must move together.
- **When to bump.** PATCH for backwards-compatible bug fixes, MINOR for
  backwards-compatible features, MAJOR for breaking API changes. While
  the project is pre-1.0, MINOR may include breaking changes; they are
  always called out in [`CHANGELOG.md`](CHANGELOG.md).
- **How to release.** Bump the version in `pyproject.toml`, move the
  `Unreleased` entries in `CHANGELOG.md` to the new version, tag the
  commit (`vX.Y.Z`), and publish a GitHub Release. The
  [`publish.yml`](.github/workflows/publish.yml) workflow builds the
  wheel, verifies `agentkavach.__version__`, runs the smoke tests on
  Python 3.9 / 3.11 / 3.12, publishes to TestPyPI, and finally to PyPI.

See [`CHANGELOG.md`](CHANGELOG.md) for the full release history.

---

## Security

Found a vulnerability? Please email **security@agentkavach.com** instead
of opening a public issue. See [`SECURITY.md`](SECURITY.md) for our
disclosure policy and supported versions.

---

## License

MIT. See [LICENSE](LICENSE).
