Metadata-Version: 2.4
Name: banditdb-python
Version: 0.1.6
Summary: Official Python SDK for BanditDB
Author-email: Simeon Lukov <s.lukov@dynamicpricing.ai>
License-Expression: Apache-2.0
License-File: LICENSE
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Requires-Dist: mcp>=1.0.0
Requires-Dist: requests>=2.25.0
Requires-Dist: urllib3>=1.26.0
Provides-Extra: all
Requires-Dist: econml>=0.15.0; extra == 'all'
Requires-Dist: httpx>=0.27.0; extra == 'all'
Requires-Dist: numpy>=1.24.0; extra == 'all'
Requires-Dist: polars>=0.20.0; extra == 'all'
Requires-Dist: scikit-learn>=1.3.0; extra == 'all'
Requires-Dist: textual>=0.47.0; extra == 'all'
Provides-Extra: async
Requires-Dist: httpx>=0.27.0; extra == 'async'
Provides-Extra: causal
Requires-Dist: econml>=0.15.0; extra == 'causal'
Requires-Dist: numpy>=1.24.0; extra == 'causal'
Requires-Dist: polars>=0.20.0; extra == 'causal'
Requires-Dist: scikit-learn>=1.3.0; extra == 'causal'
Provides-Extra: dashboard
Requires-Dist: textual>=0.47.0; extra == 'dashboard'
Provides-Extra: dev
Requires-Dist: numpy>=1.24.0; extra == 'dev'
Requires-Dist: polars>=0.20.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Provides-Extra: eval
Requires-Dist: numpy>=1.24.0; extra == 'eval'
Requires-Dist: polars>=0.20.0; extra == 'eval'
Description-Content-Type: text/markdown

# BanditDB Python SDK

The official Python client and Model Context Protocol (MCP) server for **BanditDB** — the ultra-fast, lock-free Contextual Bandit database written in Rust.

BanditDB abstracts away the complex linear algebra of Reinforcement Learning (LinUCB, Thompson Sampling) behind a dead-simple API. Build real-time personalizers, dynamic A/B tests, and give LLM agents mathematically rigorous persistent memory.

## Installation

```bash
pip install banditdb-python
```

Requires the BanditDB Rust server running (default: `http://localhost:8080`).

---

## 1. Standard SDK Usage

The client features automatic connection pooling, exponential backoff retries, and strict timeouts.

```python
from banditdb import Client, BanditDBError

# Connect to the BanditDB server.
# Pass api_key if BANDITDB_API_KEY is set on the server.
db = Client(
    url="http://localhost:8080",
    timeout=2.0,
    api_key="your-secret-key",   # omit if server runs without auth
)

try:
    # 1. Create a campaign (run once at startup)
    # algorithm defaults to "linucb"; use "thompson_sampling" for Bayesian exploration
    db.create_campaign(
        campaign_id="checkout_upsell",
        arms=["offer_discount", "offer_free_shipping"],
        feature_dim=3,
    )
    # or: db.create_campaign(..., algorithm="thompson_sampling")

    # 2. A user arrives — ask the database what to show them
    # Context: [is_mobile, cart_value_normalized, is_returning_user]
    arm_id, interaction_id = db.predict("checkout_upsell", [1.0, 0.8, 0.0])
    print(f"Showing: {arm_id}")  # e.g., "offer_free_shipping"

    # 3. The user clicked — send the reward
    db.reward(interaction_id, reward=1.0)

except BanditDBError as e:
    print(f"Database error: {e}")
```

### All Client methods

**Health**

| Method | Description |
|--------|-------------|
| `health()` | Returns `True` if the server is reachable and the WAL writer is healthy. |
| `health_detail()` | Returns the full health dict including per-campaign `entropy` and `status` (`"ok"` / `"warning"` / `"critical"`). |

**Campaigns**

| Method | Description |
|--------|-------------|
| `create_campaign(campaign_id, arms, feature_dim, alpha=1.0, algorithm="linucb", metadata=None)` | Register a new campaign. `algorithm` accepts `"linucb"`, `"thompson_sampling"`, `NeuralLinUCBConfig`, or `ProgressiveConfig`. `metadata` is an arbitrary JSON dict (≤ 64 KB). |
| `list_campaigns()` | Returns a list of all campaigns (active and archived) with `alpha`, `arm_count`, and `algorithm`. |
| `campaign_info(campaign_id)` | Returns full per-arm state: `theta`, `theta_norm`, prediction and reward counters. Raises `APIError` (404) if not found. |
| `report(campaign_id)` | Business-level convergence report. `converged=True` means one arm has a statistically significant lead at 95% CI — safe to stop. `converged=False` means leading but CIs still overlap. `converged=None` means not enough data yet (< 30 rewards per arm). |
| `diagnostics(campaign_id)` | Operator diagnostics: per-arm theta norms, A_inv uncertainty bounds, entropy health (`selection_entropy`, `entropy_status`, `entropy_trend`, `likely_cause`, `suggested_action`), tournament traffic, and neural buffer size. |
| `archive_campaign(campaign_id)` | Soft-delete: pauses predictions/rewards but preserves all learned weights. Recoverable with `restore_campaign()`. |
| `restore_campaign(campaign_id)` | Restore an archived campaign to active status with all weights intact. |
| `delete_campaign(campaign_id)` | Permanently delete a campaign. Returns `False` if not found. |

**Predict & Reward**

| Method | Description |
|--------|-------------|
| `predict(campaign_id, context)` | Returns `(arm_id, interaction_id)`. Pass `interaction_id` to `reward()` to close the loop. |
| `batch_predict(predictions)` | Predict for up to 100 campaign/context pairs in a single round-trip. Each item: `{"campaign_id": str, "context": List[float]}`. Returns list of `{arm_id, interaction_id}` or `{error}` per item. |
| `reward(interaction_id, reward)` | Record outcome. `reward` must be in `[0.0, 1.0]`. Raises `APIError` if the interaction has already been rewarded or has expired (default TTL: 24 h). |

**Data & Export**

| Method | Description |
|--------|-------------|
| `checkpoint()` | Flush WAL, snapshot models, write Parquet shards, run neural retrain + tournament eval, rotate WAL. Returns a summary string. |
| `export()` | List Parquet export shards grouped by campaign. Returns `{export_dir, shards}`. |

---

## 2. The AI "Hive Mind" (Model Context Protocol)

Standard LLM agents are stateless — if they route a task to the wrong model and fail, they repeat the same mistake tomorrow. BanditDB's built-in MCP server gives the entire agent swarm shared persistent memory.

### Starting the MCP server

```bash
# Set environment variables before starting
export BANDITDB_URL=http://localhost:8080
export BANDITDB_API_KEY=your-secret-key   # omit if server runs without auth

banditdb-mcp
```

### Connecting to Claude Desktop

Add to your Claude configuration file:

- Mac: `~/Library/Application Support/Claude/claude_desktop_config.json`
- Windows: `%APPDATA%\Claude\claude_desktop_config.json`

```json
{
  "mcpServers": {
    "banditdb": {
      "command": "banditdb-mcp",
      "args": [],
      "env": {
        "BANDITDB_URL": "http://localhost:8080",
        "BANDITDB_API_KEY": "your-secret-key"
      }
    }
  }
}
```

The agent swarm now has nine tools:

| Tool | What it does |
|------|--------------|
| `create_campaign` | Create a new decision campaign. Accepts `algorithm` (`"linucb"` or `"thompson_sampling"`) and `alpha`. Use Thompson Sampling for natural Bayesian exploration with no tuning needed. |
| `list_campaigns` | List all active campaigns (shows `algorithm` and `alpha`) — useful to check what exists before calling `get_intuition`. |
| `campaign_diagnostics` | Inspect per-arm learning state: `theta_norm`, prediction counts, reward rates, and entropy health. Use when a campaign doesn't seem to be learning or one arm is dominating. |
| `campaign_report` | Business-level convergence report. Tells you whether the campaign has statistically converged and which arm is winning with confidence intervals. |
| `get_intuition` | Ask BanditDB which arm to pick for a given context. Returns the arm and an `interaction_id` to save. |
| `batch_get_intuition` | Get decisions for multiple campaigns in a single round-trip. Pass a list of `{campaign_id, context}` dicts. |
| `record_outcome` | Report whether the chosen action succeeded (1.0) or failed (0.0). Updates the shared model. |
| `archive_campaign` | Soft-delete a campaign. Pauses predictions/rewards but preserves all learned weights. |
| `restore_campaign` | Restore an archived campaign to active status with all weights intact. |

Every decision made by any agent in the network improves the routing for all future agents.

---

## 3. Data Science & Offline Evaluation

BanditDB event-sources every prediction and reward to a Write-Ahead Log (WAL). Calling `checkpoint()` compiles completed prediction→reward pairs into Snappy-compressed Parquet files — one per campaign — for offline analysis with Polars or Pandas.

Every prediction is guaranteed to appear in the Parquet file even if its reward arrives hours later: BanditDB re-emits in-flight interactions at each checkpoint so delayed rewards are always captured in a future cycle.

```python
# Checkpoint: snapshot models, write Parquet, rotate the WAL.
# Call this on a schedule or after significant traffic.
summary = db.checkpoint()
print(summary)
# "Checkpoint written and WAL rotated: 2 campaigns, offset 4821 bytes,
#  150 interactions exported, 3 in-flight re-emitted"

# List which Parquet files are available
print(db.export())
# 'Parquet files in /data/exports: ["llm_routing.parquet"]'

# Load directly from the mounted volume into Polars.
# Flat schema: interaction_id | arm_id | reward | predicted_at | rewarded_at | propensity | feature_0 | ...
import polars as pl
df = pl.read_parquet("/data/exports/llm_routing.parquet")
print(df.head())
print(df.columns)
```

### Offline Policy Evaluation (OPE)

The SDK ships three OPE estimators in `banditdb.eval`. They answer the question: *"what would my average reward have been under a different policy — without running a live experiment?"*

Install the eval dependencies:

```bash
pip install "banditdb-python[eval]"
```

| Estimator | Function | How it works | When to use |
|-----------|----------|-------------|-------------|
| **Replay** | `replay(df)` | Accepts each interaction with probability `(1/K) / propensity` (Li et al. 2010). Unbiased sample of the uniform random policy. | Sanity check baseline. Low coverage is expected — ~1/K of interactions are used. |
| **IPS / SNIPS** | `ips(df, clip=10.0)` | Uses every interaction with importance weight `(1/K) / propensity`. Self-normalised to reduce variance. Weight clipping (default 10×) controls the bias-variance tradeoff. | Primary estimator. Use when you have enough data but want full coverage. |
| **Doubly Robust** | `doubly_robust(df, clip=10.0)` | Fits a linear reward model, then applies an IPS correction on residuals. Consistent if either the reward model or the propensities are correct. | Best statistical efficiency. Use when comparing multiple policies or sweeping `alpha`. |

All three estimators:
- Accept a Polars or pandas DataFrame loaded from a BanditDB Parquet export
- Evaluate the **uniform random policy** as the target (the unbiased baseline to beat)
- Raise `ValueError` for Thompson Sampling campaigns (propensity column is null — TS does not log propensities)
- Return an `OPEResult` with `estimate`, `std_error`, `n_used`, `n_total`, and `method`

```python
import polars as pl
from banditdb.eval import replay, ips, doubly_robust

df = pl.read_parquet("/data/exports/llm_routing.parquet")

# How much reward would a uniform random policy have earned?
print(replay(df))
# OPEResult(method='replay', estimate=0.4821, std_error=0.0312, coverage=22.1% [33/149])

print(ips(df))
# OPEResult(method='ips', estimate=0.5103, std_error=0.0187, coverage=100.0% [149/149])

print(doubly_robust(df))
# OPEResult(method='doubly_robust', estimate=0.5219, std_error=0.0141, coverage=100.0% [149/149])

# Compare against the observed reward of the logging policy:
print("Observed (logging policy):", df["reward"].mean())
# If observed >> estimate, the campaign has learned something real — it outperforms random.
```

**Practical use: sweep `alpha` offline before deploying.** Train a campaign on real traffic, checkpoint to Parquet, then replay different alpha values through `doubly_robust()` to find the best exploration level — no live experiment needed.

> **Note:** OPE requires the `propensity` column, which is only written for **LinUCB** campaigns. Thompson Sampling campaigns log `null` propensities because TS arm selection is stochastic and propensity scoring requires a deterministic logging policy.

---

## Choosing an Algorithm

BanditDB supports four algorithms, selected at campaign creation time.

| Algorithm | `algorithm` value | Exploration style | When to use |
|-----------|------------------|-------------------|-------------|
| **LinUCB** | `"linucb"` (default) | Deterministic UCB bonus: `θ·x + α√(x·A⁻¹·x)` | Predictable, tunable. Sweep `alpha` offline to calibrate. |
| **Linear Thompson Sampling** | `"thompson_sampling"` | Samples θ̃ ~ N(θ, α²·A⁻¹), scores by θ̃·x | Bayesian posterior — no alpha-sweep needed. Concurrent users automatically diversify choices. |
| **NeuralLinUCB** | `NeuralLinUCBConfig(...)` | Deep MLP embedding + LinUCB in embedding space | Non-linear reward functions. Retrains the MLP every N rewards. |
| **Progressive** | `ProgressiveConfig(...)` | Self-tuning tournament: runs base + challenger in parallel, shifts traffic to the winner | Zero-configuration model selection. Picks the best algorithm automatically. |

```python
from banditdb import Client, NeuralLinUCBConfig, ProgressiveConfig

db = Client("http://localhost:8080")

# LinUCB (default)
db.create_campaign("routing", ["fast", "cheap"], feature_dim=4, alpha=1.5)

# Thompson Sampling — natural Bayesian exploration, alpha=1.0 is ideal
db.create_campaign("routing_ts", ["fast", "cheap"], feature_dim=4,
                   algorithm="thompson_sampling")

# NeuralLinUCB — learns a deep embedding of the context, then applies LinUCB
cfg = NeuralLinUCBConfig(
    context_dim=4,     # must match feature_dim
    embed_dim=32,      # arm matrix dimension (default 32)
    hidden_dim=128,    # MLP hidden layer width (default 128)
    retrain_every=200, # retrain the MLP every N cumulative rewards
)
db.create_campaign("routing_neural", ["fast", "cheap"], feature_dim=4, algorithm=cfg)

# Progressive — runs LinUCB vs NeuralLinUCB, shifts traffic to whoever wins SNIPS checkpoints
cfg = ProgressiveConfig(
    base="linucb",
    challenger=NeuralLinUCBConfig(context_dim=4, embed_dim=32),
    min_obs=100,       # minimum buffer entries per arm before any traffic shift
    required_wins=3,   # consecutive checkpoint wins to earn one traffic step
    step_bps=1000,     # traffic delta per win run, in basis points (1000 = 10%)
)
db.create_campaign("routing_prog", ["fast", "cheap"], feature_dim=4, algorithm=cfg)
```

All four algorithms share the same `predict` → `reward` loop.

---

## Error Handling

| Exception | When raised |
|-----------|-------------|
| `BanditDBError` | Base exception — catch this to handle all SDK errors. |
| `ConnectionError` | Server is offline or unreachable. |
| `TimeoutError` | Request exceeded the configured timeout. |
| `APIError` | Server returned an error (e.g., campaign not found, unauthorized). |

---

## License

Apache-2.0 — Copyright (C) 2026 Simeon Lukov and Dynamic Pricing Ltd.
See the [main repository](https://github.com/dynamicpricing-ai/banditdb) for details.
