Metadata-Version: 2.4
Name: flowprompt-ai
Version: 0.3.0
Summary: Type-safe prompt management with automatic optimization for LLMs
Project-URL: Homepage, https://github.com/yotambraun/flowprompt
Project-URL: Documentation, https://github.com/yotambraun/flowprompt/tree/main/docs
Project-URL: Repository, https://github.com/yotambraun/flowprompt
Project-URL: Changelog, https://github.com/yotambraun/flowprompt/blob/main/CHANGELOG.md
Project-URL: Issues, https://github.com/yotambraun/flowprompt/issues
Project-URL: Source Code, https://github.com/yotambraun/flowprompt
Author-email: Yotam Braun <yotambarun93@gmail.com>
Maintainer-email: Yotam Braun <yotambarun93@gmail.com>
License: MIT
License-File: LICENSE
Keywords: a-b-testing,ai,anthropic,chatgpt,claude,gpt,langchain-alternative,litellm,llm,llm-framework,machine-learning,nlp,observability,openai,prompt,prompt-engineering,prompt-management,prompt-optimization,prompt-testing,prompt-versioning,pydantic,structured-output,type-safety
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Framework :: Pydantic :: 2
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: jinja2<4.0.0,>=3.0.0
Requires-Dist: litellm<2.0.0,>=1.0.0
Requires-Dist: pydantic<3.0.0,>=2.0.0
Requires-Dist: pyyaml<7.0.0,>=6.0.0
Requires-Dist: typing-extensions>=4.5.0; python_version < '3.11'
Provides-Extra: all
Requires-Dist: opencv-python>=4.8.0; extra == 'all'
Requires-Dist: opentelemetry-api>=1.20.0; extra == 'all'
Requires-Dist: opentelemetry-sdk>=1.20.0; extra == 'all'
Requires-Dist: optuna>=3.5.0; extra == 'all'
Requires-Dist: pypdf>=3.15.0; extra == 'all'
Requires-Dist: typer>=0.9.0; extra == 'all'
Provides-Extra: cli
Requires-Dist: typer>=0.9.0; extra == 'cli'
Provides-Extra: dev
Requires-Dist: mypy>=1.8.0; extra == 'dev'
Requires-Dist: pre-commit>=3.6.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest-mock>=3.12.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Requires-Dist: typer>=0.9.0; extra == 'dev'
Requires-Dist: types-requests; extra == 'dev'
Provides-Extra: multimodal
Requires-Dist: opencv-python>=4.8.0; extra == 'multimodal'
Requires-Dist: pypdf>=3.15.0; extra == 'multimodal'
Provides-Extra: optimization
Requires-Dist: optuna>=3.5.0; extra == 'optimization'
Provides-Extra: pytest
Requires-Dist: pytest>=7.0.0; extra == 'pytest'
Provides-Extra: tracing
Requires-Dist: opentelemetry-api>=1.20.0; extra == 'tracing'
Requires-Dist: opentelemetry-sdk>=1.20.0; extra == 'tracing'
Description-Content-Type: text/markdown

# FlowPrompt

**Stop guessing which prompt works. Measure it.**

[![PyPI](https://img.shields.io/pypi/v/flowprompt-ai.svg)](https://pypi.org/project/flowprompt-ai/)
[![Downloads](https://static.pepy.tech/badge/flowprompt-ai)](https://pepy.tech/project/flowprompt-ai)
[![Downloads/Month](https://static.pepy.tech/badge/flowprompt-ai/month)](https://pepy.tech/project/flowprompt-ai)
[![Python](https://img.shields.io/pypi/pyversions/flowprompt-ai.svg)](https://pypi.org/project/flowprompt-ai/)
[![License](https://img.shields.io/pypi/l/flowprompt-ai.svg)](https://github.com/yotambraun/flowprompt/blob/main/LICENSE)
[![Tests](https://github.com/yotambraun/flowprompt/workflows/CI/badge.svg)](https://github.com/yotambraun/flowprompt/actions)
[![codecov](https://codecov.io/gh/yotambraun/flowprompt/graph/badge.svg?token=3IDNOYK3D3)](https://codecov.io/gh/yotambraun/flowprompt)

---

## 30-Second Quickstart

Define prompts as Python classes. No API key needed to preview messages:

```python
from flowprompt import Prompt
from pydantic import BaseModel

class ExtractUser(Prompt):
    system = "Extract user info from text."
    user = "Text: {text}"

    class Output(BaseModel):
        name: str
        age: int

# Preview messages -- works without an API key
print(ExtractUser(text="John is 25").to_messages())
# [{'role': 'system', 'content': 'Extract user info from text.'},
#  {'role': 'user', 'content': 'Text: John is 25'}]

# Run against any LLM
result = ExtractUser(text="John is 25").run(model="gpt-4o")
print(result.name)  # "John"
print(result.age)   # 25
```

---

## Compare Prompts in 5 Lines

The killer feature: find which prompt actually works better, with statistical significance.

```python
from flowprompt import Prompt, compare

class Concise(Prompt):
    system = "Be concise."
    user = "Summarize: {text}"

class Detailed(Prompt):
    system = "Be thorough and detailed."
    user = "Provide a comprehensive summary of: {text}"

result = compare(
    {"concise": Concise, "detailed": Detailed},
    inputs=[{"text": "Python is a programming language..."}, ...],
    expected=["Python is a versatile language", ...],
    model="gpt-4o-mini",
)
print(result)
# Comparison Results
# ========================================
#   concise: 90% accuracy, 245ms avg, 50 runs << WINNER
#   detailed: 72% accuracy, 410ms avg, 50 runs
#
#   p=0.0231 (SIGNIFICANT)
#   effect size: -20.00%
```

---

## Test Prompts in CI

FlowPrompt includes a pytest plugin (auto-discovered, zero config):

```python
# test_prompts.py
import pytest

@pytest.mark.prompt_test
def test_sentiment(fp_compare):
    result = fp_compare(
        {"v1": PromptV1, "v2": PromptV2},
        inputs=[{"text": "I love this!"}],
        expected=["positive"],
        model="gpt-4o-mini",
    )
    result.assert_significant()
    result.assert_winner("v1")
    result.assert_no_errors()
```

```bash
pip install flowprompt-ai[pytest]
pytest --no-slow-prompts  # skip expensive tests
```

---

## Installation

```bash
pip install flowprompt-ai
```

> **Note:** The package is installed as `flowprompt-ai` but imported as `flowprompt`

**Optional extras:**

```bash
pip install flowprompt-ai[all]        # Everything
pip install flowprompt-ai[pytest]     # Pytest fixtures & markers
pip install flowprompt-ai[cli]        # CLI tools
pip install flowprompt-ai[tracing]    # OpenTelemetry support
pip install flowprompt-ai[multimodal] # Images, PDFs, audio, video
```

---

## A/B Testing

FlowPrompt is the only Python LLM framework with built-in A/B testing.

**Quick comparison** with `compare()`:

```python
from flowprompt import compare

result = compare(
    {"v1": PromptV1, "v2": PromptV2, "v3": PromptV3},
    inputs=test_data,
    model="gpt-4o-mini",
    confidence_level=0.95,
)

if result.winner:
    print(f"Winner: {result.winner} (p={result.statistical_result.p_value:.4f})")
```

**Full experiment control** when you need production traffic splitting, sticky user assignment, or multi-armed bandits:

```python
from flowprompt.testing import create_simple_experiment

config, runner = create_simple_experiment(
    name="prompt_comparison",
    control_prompt=PromptV1,
    treatment_prompts=[("v2", PromptV2)],
    min_samples=100,
)

runner.start_experiment(config.id)
variant = runner.get_variant(config.id, user_id="user123")
result = runner.run_prompt(config.id, variant.name, input_data={"text": "..."})

summary = runner.get_summary(config.id)
if summary.winner:
    print(f"Winner: {summary.winner.name}")
```

**Six allocation strategies:** Random, Round-Robin, Weighted, Epsilon-Greedy, UCB, Thompson Sampling.

**Four statistical tests:** Z-test, Chi-squared, Welch's t-test, Bayesian.

---

## Structured Outputs

Define expected output as a Pydantic model. Parsing and validation are automatic.

```python
from pydantic import BaseModel, Field

class Sentiment(Prompt):
    system = "Analyze the sentiment of the given text."
    user = "Text: {text}"

    class Output(BaseModel):
        sentiment: str = Field(description="positive, negative, or neutral")
        confidence: float = Field(ge=0.0, le=1.0)

result = Sentiment(text="I love this!").run(model="gpt-4o")
print(result.sentiment)   # "positive"
print(result.confidence)  # 0.95
```

Models that support native JSON schema get guaranteed valid output. Others fall back to JSON mode with schema hints.

---

## Multi-Provider Support

Switch between 100+ providers with a single parameter.

```python
result = prompt.run(model="gpt-4o")                              # OpenAI
result = prompt.run(model="anthropic/claude-3-5-sonnet-20241022") # Anthropic
result = prompt.run(model="gemini/gemini-2.0-flash-exp")          # Google
result = prompt.run(model="ollama/llama3")                        # Local
```

---

## More Features

| Feature | Example |
|---------|---------|
| **Caching** | `configure_cache(enabled=True, default_ttl=3600)` -- cut costs 50-90% |
| **Optimization** | DSPy-style auto-improvement with `flowprompt.optimize` |
| **Streaming** | `for chunk in prompt.stream(model="gpt-4o"): ...` |
| **Observability** | `get_tracer().get_summary()` -- costs, tokens, latency |
| **YAML prompts** | `load_prompt("prompts/my_prompt.yaml")` |
| **Multimodal** | Images, PDFs, audio via `flowprompt.multimodal` |
| **CLI** | `flowprompt optimize prompt.py examples.json` |

---

## Comparison

| Feature | FlowPrompt | LangChain | Instructor | DSPy |
|---------|:----------:|:---------:|:----------:|:----:|
| **A/B testing** | **Built-in** | No | No | No |
| Structured outputs | Yes | Partial | **Best-in-class** | Yes |
| Auto-optimization | Yes | No | No | **Best-in-class** |
| Multi-provider | Yes | Yes | Yes | Partial |
| Caching | Yes | Yes | Yes | Yes |
| Cost tracking | Yes | Partial | No | No |
| Streaming | Yes | Yes | Yes | Yes |
| Import time | <100ms | ~2s | <100ms | ~6s |

---

## Documentation

- **[Quick Start Guide](docs/quickstart.md)** -- Get started in 5 minutes
- **[A/B Testing Guide](docs/ab-testing.md)** -- Run experiments
- **[API Reference](docs/api.md)** -- Complete API documentation
- **[Optimization Guide](docs/optimization.md)** -- Improve prompts automatically
- **[Examples](examples/)** -- Runnable example scripts

---

## Contributing

Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

```bash
git clone https://github.com/yotambraun/flowprompt.git
cd flowprompt
uv venv && uv sync --all-extras
uv run pytest
```

---

## License

MIT License -- see [LICENSE](LICENSE) for details.

---

**Made with care by [Yotam Braun](https://github.com/yotambraun)**

[GitHub](https://github.com/yotambraun/flowprompt) | [PyPI](https://pypi.org/project/flowprompt-ai/) | [Issues](https://github.com/yotambraun/flowprompt/issues)
