Metadata-Version: 2.4
Name: pguard-llm
Version: 0.1.3
Summary: Prompt version control and A/B testing for LLM applications
Author: JALLAD
License: MIT
Project-URL: Homepage, https://github.com/ES7/pguard-llm
Project-URL: Repository, https://github.com/ES7/pguard-llm
Project-URL: Issues, https://github.com/ES7/pguard-llm/issues
Keywords: llm,prompt,versioning,testing,openai,anthropic,gemini,ai
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: click>=8.0
Requires-Dist: rich>=13.0
Provides-Extra: openai
Requires-Dist: openai>=1.0; extra == "openai"
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.20; extra == "anthropic"
Provides-Extra: gemini
Requires-Dist: google-genai>=0.5; extra == "gemini"
Provides-Extra: all
Requires-Dist: openai>=1.0; extra == "all"
Requires-Dist: anthropic>=0.20; extra == "all"
Requires-Dist: google-genai>=0.5; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: openai>=1.0; extra == "dev"
Requires-Dist: anthropic>=0.20; extra == "dev"
Requires-Dist: google-genai>=0.5; extra == "dev"
Dynamic: license-file

# pguard-llm

> Prompt version control and A/B testing for LLM applications.

```bash
pip install pguard-llm
```

## The Problem

You tweak a prompt. It gets better. You tweak it again. Now you don't remember what changed, which version was best, or how much each version costs to run.

`pguard-llm` fixes this — version your prompts, run them, compare them.

## Quick Start

```python
from pguard import Prompt

p = Prompt("summarize")

# Save versions
p.save("v1", "Summarize this: {text}", description="Simple")
p.save("v2", "In 3 bullet points, summarize: {text}", description="Structured")

# Run against an LLM
result = p.run(
    "v1",
    provider="openai",
    model="gpt-4o",
    api_key="sk-...",
    input_vars={"text": "Your article text here..."}
)

print(result.output)
print(result.cost_usd)
print(result.latency_ms)

# Compare v1 vs v2
comparison = p.compare("v1", "v2")
print(comparison.summary())
```

## Supported Providers

| Provider   | Install                      | Models                          |
|------------|------------------------------|---------------------------------|
| OpenAI     | `pip install openai`         | gpt-4o, gpt-4o-mini, ...        |
| Anthropic  | `pip install anthropic`      | claude-sonnet-4, claude-haiku-4 |
| Gemini     | `pip install google-genai`   | gemini-2.5-flash, gemini-1.5-pro|

## Install with Provider

```bash
pip install "pguard-llm[openai]"
pip install "pguard-llm[anthropic]"
pip install "pguard-llm[gemini]"
pip install "pguard-llm[all]"
```

## Storage Backends

```python
# File storage (default) — zero setup
p = Prompt("summarize", storage="file")

# SQLite — better for querying
p = Prompt("summarize", storage="sqlite")
```

## A/B Comparison

```python
comparison = p.compare("v1", "v2")
summary = comparison.summary()

# summary contains:
# - latency_ms: avg latency per version + winner
# - cost_usd: avg cost per version + winner
# - quality_score: avg quality per version + winner
# - tokens_avg: avg tokens per version
```

## CLI

```bash
pguard list                        # list all prompts
pguard versions summarize          # list versions
pguard show summarize v1           # show template
pguard runs summarize v1           # show run history
pguard compare summarize v1 v2     # compare versions
```

## RunResult

```python
result.output        # LLM response text
result.cost_usd      # Cost in USD
result.latency_ms    # Latency in milliseconds
result.tokens_in     # Input tokens
result.tokens_out    # Output tokens
result.quality_score # Quality score (0-1)
result.provider      # Provider used
result.model         # Model used
```

## License

MIT
