Metadata-Version: 2.4
Name: rewardguard-premium
Version: 2.0.4
Summary: Auto-tuning reward system for production RL applications (Premium)
Author-email: RewardGuard Team <support@rewardguard.ai>
License: Proprietary
Project-URL: Homepage, https://rewardguard.ai/premium
Project-URL: Documentation, https://docs.rewardguard.ai/premium
Project-URL: Support, https://rewardguard.ai/support
Keywords: reinforcement-learning,AI,machine-learning,reward-hacking,AI-alignment,AI-safety,auto-tuning,production-RL,reward-balance
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: rewardguard>=1.0.0
Requires-Dist: numpy>=1.20.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: flake8>=6.0; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"

# RewardGuard Premium

**Automatic Reward Alignment for Production RL — with Statistical Detection and Auto-Correction**

RewardGuard Premium is the paid tier of [RewardGuard](https://rewardguard.ai). Install it freely from PyPI — just sign in to your RewardGuard account to activate.

---

## Installation

```bash
pip install rewardguard-premium
```

On first use you will be prompted to sign in:

```
RewardGuard Premium — Sign in to your account
  Visit https://rewardguard.ai to create an account

  Email: you@example.com
  Password: ••••••••
  Signed in successfully!
```

Your session is saved to `~/.rewardguard/session.json` and refreshed automatically. You only sign in once per machine.

**Requires an active RewardGuard Premium subscription.**
Subscribe at: https://rewardguard.ai/premium

---

## Sign-in CLI

```bash
rewardguard-premium login    # sign in (or switch accounts)
rewardguard-premium logout   # clear saved session
rewardguard-premium status   # show who you are signed in as
```

**CI / automated environments** — use env vars instead of the interactive prompt:

```bash
export REWARDGUARD_EMAIL=you@example.com
export REWARDGUARD_PASSWORD=yourpassword
```

---

## Quick Start

```python
from rewardguard_premium import AutoMonitor

monitor = AutoMonitor(
    expected={"task": 0.7, "safety": 0.3},
    baseline_steps=300,   # warm-up before detection activates
    auto_correct=True,    # adjust weights automatically when flagged
)

for episode in range(num_episodes):
    for step in range(max_steps):
        r_task, r_safety = env.step(action)

        snapshot = monitor.step({"task": r_task, "safety": r_safety})

        if snapshot:
            if snapshot.flag == "critical":
                # Apply auto-corrected weights back to the environment
                env.set_reward_weights(monitor.weights)

monitor.print_report()
```

---

## What AutoMonitor Does

| Phase | What happens |
|---|---|
| **Warm-up** (`baseline_steps`) | Learns the normal ratio for your environment. Returns `None` from `step()`. |
| **Detection** | Computes per-component z-scores against the learned baseline. |
| **Alignment score** | 0 (misaligned) → 1 (fully aligned), sigmoid-mapped from the max z-score. |
| **Flagging** | `ok` (score > 0.75) / `warning` (> 0.5) / `critical` (≤ 0.5) |
| **Auto-correction** | Adjusts per-component weight multipliers when flagged. |
| **Drift velocity** | Linear regression slope over recent scores — distinguishes trends from spikes. |

---

## AlignmentSnapshot

Every call to `step()` after warm-up returns an `AlignmentSnapshot`:

```python
snapshot.alignment_score      # float 0–1
snapshot.flag                 # "ok" / "warning" / "critical"
snapshot.z_scores             # {"task": -0.4, "safety": +2.8}
snapshot.drift_velocity       # negative = worsening trend
snapshot.corrections_applied  # {"safety": 1.24}  — weights changed this step
snapshot.component_ratios     # {"task": 68.3, "safety": 31.7}
```

---

## Framework Integrations

### Weights & Biases

```python
from rewardguard_premium import AutoMonitor, make_wandb_callback
import wandb

wandb.init(project="my-rl-run")
monitor = AutoMonitor(
    expected={"task": 0.7, "safety": 0.3},
    callbacks=[make_wandb_callback()],
)
```

### TensorBoard

```python
from torch.utils.tensorboard import SummaryWriter
from rewardguard_premium import AutoMonitor, make_tensorboard_callback

writer = SummaryWriter("runs/my_run")
monitor = AutoMonitor(
    expected={"task": 0.7, "safety": 0.3},
    callbacks=[make_tensorboard_callback(writer)],
)
```

### Stable-Baselines3

```python
from stable_baselines3 import PPO
from rewardguard_premium import AutoMonitor, make_sb3_callback

monitor = AutoMonitor(expected={"task": 0.7, "safety": 0.3})
model = PPO("MlpPolicy", env)
model.learn(total_timesteps=500_000, callback=make_sb3_callback(monitor))
```

> Your environment must include `"reward_components"` in its `info` dict for the SB3 callback.

---

## Save / Load State

```python
# Save after training
monitor.save("run_42_state.json")

# Resume later
monitor = AutoMonitor.load("run_42_state.json")
```

---

## Export Data

```python
monitor.to_json("results.json")   # full state
monitor.to_csv("snapshots.csv")   # one row per detection-phase step
```

---

## AutoMonitor Parameters

| Parameter | Default | Description |
|---|---|---|
| `expected` | required | Target distribution, e.g. `{"task": 0.7, "safety": 0.3}` |
| `baseline_steps` | `300` | Warm-up steps before detection activates |
| `z_threshold` | `2.5` | Z-score at which a component is flagged |
| `auto_correct` | `True` | Automatically adjust weights when flagged |
| `correction_rate` | `0.2` | Fraction of required correction applied per step |
| `tolerance` | `5.0` | Percentage-point tolerance for the free-tier `check()` method |
| `window` | `200` | Rolling window size for ratio computation |
| `drift_window` | `30` | Snapshots used to compute drift velocity |
| `callbacks` | `[]` | List of callables invoked with each `AlignmentSnapshot` |

---

## Free vs Premium

| Feature | Free (`rewardguard`) | Premium |
|---|---|---|
| Live in-loop monitoring | ✅ | ✅ |
| Log-file analysis | ✅ | ✅ |
| Imbalance detection | ✅ | ✅ |
| Weight recommendations | ✅ | ✅ |
| Baseline learning | ❌ | ✅ |
| Z-score detection | ❌ | ✅ |
| Alignment score (0–1) | ❌ | ✅ |
| Drift velocity | ❌ | ✅ |
| Auto-correction | ❌ | ✅ |
| WandB / TensorBoard / SB3 | ❌ | ✅ |
| Save / load state | ❌ | ✅ |

---

## License

Proprietary — requires an active RewardGuard Premium subscription.
© 2026 RewardGuard | https://rewardguard.ai
