Metadata-Version: 2.4
Name: watchllm
Version: 0.1.1
Summary: AI agent reliability testing platform
Home-page: https://github.com/watchllm/watchllm
Author: WatchLLM
Author-email: WatchLLM <hello@watchllm.dev>
License: MIT
Project-URL: Homepage, https://watchllm.dev
Project-URL: Repository, https://github.com/watchllm/watchllm
Keywords: ai,agents,testing,reliability,llm
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Testing
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: requests>=2.28.0
Dynamic: author
Dynamic: home-page
Dynamic: requires-python

<p align="center">
  <img src="https://raw.githubusercontent.com/WatchLLM/.github/refs/heads/main/edited%20(1).png" width="800" alt="WatchLLM Banner" />
</p>

<h3 align="center">
  AI agent reliability testing infrastructure. Stress test your agents against 8 attack categories—prompt injection, tool abuse, jailbreaks, data exfiltration, and more—before they hit production.
</h3>

<p align="center">
  <strong>Observability, evaluation, and adversarial testing platform for LLM agents.</strong><br>
  Ship reliable, secure, and debuggable agentic systems.
</p>

<p align="center">
  <a href="https://watchllm.dev">Website</a> • 
  <a href="https://watchllm.dev/docs">Documentation</a>
</p>

<div align="center">
  
  [![GitHub Stars](https://img.shields.io/github/stars/kaadipranav/watchllm.dev?style=flat-square&logo=github)](https://github.com/kaadipranav/watchllm.dev)
  [![License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
  [![TypeScript](https://img.shields.io/badge/TypeScript-Ready-blue.svg)](https://www.typescriptlang.org/)
  [![Python](https://img.shields.io/badge/Python-Ready-blue.svg)](https://www.python.org/)

</div>

---

## Installation

Install the official Python SDK via pip:

```bash
pip install watchllm
```

## Authentication

WatchLLM requires an API key to securely log your simulations to your dashboard.

**Option A: Interactive CLI Login (Recommended for Local Dev)**
```bash
watchllm auth login
```
This will prompt you for your API key and securely save it to `~/.watchllm/config`.

**Option B: Environment Variables (Recommended for CI/CD)**
Set the following environment variable in your pipeline or `.env` file:
```bash
export WATCHLLM_API_KEY="your_api_key_here"
```

## Running Simulations via CLI

The fastest way to test an agent is using the WatchLLM CLI. 

```bash
# Run a specific attack category
watchllm simulate --agent my_module.my_agent --categories prompt_injection

# Run multiple categories simultaneously
watchllm simulate --agent my_module.my_agent --categories prompt_injection,tool_abuse,hallucination

# Run all 8 attack categories
watchllm simulate --agent my_module.my_agent --categories all
```

*Note: Replace `my_module.my_agent` with the Python import path to your agent function. Your agent function must accept a string and return a string.*

## Native Python Integration (CI/CD)

For automated testing inside your test suite or CI/CD pipelines, wrap your agent with the `@watchllm.test` decorator. This exposes your agent's signature to the WatchLLM runner.

```python
import watchllm

@watchllm.test(
    categories=["prompt_injection", "data_exfiltration"],
    threshold=0.3,   # Raises WatchLLMThresholdError if severity >= 0.3
)
def my_agent(user_input: str) -> str:
    # Your agent logic here
    return response
```

If the agent exhibits a vulnerability during the test, the SDK will automatically fail the pipeline and exit with a non-zero status code.

## Analyzing Results

Once a simulation completes, the CLI will output the terminal state:

```text
Simulation ID: sim_abc123
Status: completed
Severity: 0.90
Verdict: Critical vulnerability detected
```

To view the complete execution graph, scrub through the node lineage, and replay specific forks, visit the dashboard:
[https://dashboard.watchllm.dev](https://dashboard.watchllm.dev)

## Command Reference

| Command | Description |
|---|---|
| `watchllm doctor` | Verifies your setup, API key validity, and agent reachability. |
| `watchllm simulate` | Launches a new attack simulation. |
| `watchllm replay` | Prints a text-based tree of the execution graph directly in your terminal. |
| `watchllm status` | Returns the current completion percentage and live severity scores of a running simulation. |

## Requirements
- Python 3.9 or higher
- The `requests` library (installed automatically)
