Metadata-Version: 2.4
Name: llm-prompt-refiner
Version: 0.1.5
Summary: A lightweight Python library for optimizing and cleaning LLM inputs
Project-URL: Homepage, https://github.com/JacobHuang91/prompt-refiner
Project-URL: Repository, https://github.com/JacobHuang91/prompt-refiner
Project-URL: Issues, https://github.com/JacobHuang91/prompt-refiner/issues
Project-URL: Documentation, https://jacobhuang91.github.io/prompt-refiner/
Author-email: Xinghao Huang <jacobhuang91@gmail.com>
License: MIT
License-File: LICENSE
Keywords: ai,llm,optimization,prompt,tokens
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.9
Provides-Extra: benchmark
Requires-Dist: matplotlib>=3.7.0; extra == 'benchmark'
Requires-Dist: openai>=1.0.0; extra == 'benchmark'
Requires-Dist: pandas>=2.0.0; extra == 'benchmark'
Requires-Dist: python-dotenv>=1.0.0; extra == 'benchmark'
Requires-Dist: scikit-learn>=1.3.0; extra == 'benchmark'
Requires-Dist: tqdm>=4.65.0; extra == 'benchmark'
Provides-Extra: dev
Requires-Dist: mike>=2.0.0; extra == 'dev'
Requires-Dist: mkdocs-autorefs>=0.5.0; extra == 'dev'
Requires-Dist: mkdocs-material>=9.5.0; extra == 'dev'
Requires-Dist: mkdocs>=1.5.0; extra == 'dev'
Requires-Dist: mkdocstrings[python]>=0.24.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: token
Requires-Dist: tiktoken>=0.7.0; extra == 'token'
Description-Content-Type: text/markdown

# Prompt Refiner

<div align="center">

[![PyPI version](https://img.shields.io/pypi/v/llm-prompt-refiner.svg)](https://pypi.org/project/llm-prompt-refiner/)
[![Python Versions](https://img.shields.io/pypi/pyversions/llm-prompt-refiner.svg)](https://pypi.org/project/llm-prompt-refiner/)
[![Downloads](https://img.shields.io/pypi/dm/llm-prompt-refiner.svg)](https://pypi.org/project/llm-prompt-refiner/)
[![GitHub Stars](https://img.shields.io/github/stars/JacobHuang91/prompt-refiner)](https://github.com/JacobHuang91/prompt-refiner)
[![CI Status](https://github.com/JacobHuang91/prompt-refiner/workflows/CI/badge.svg)](https://github.com/JacobHuang91/prompt-refiner/actions)
[![codecov](https://codecov.io/gh/JacobHuang91/prompt-refiner/branch/main/graph/badge.svg)](https://codecov.io/gh/JacobHuang91/prompt-refiner)
[![License](https://img.shields.io/github/license/JacobHuang91/prompt-refiner)](https://github.com/JacobHuang91/prompt-refiner/blob/main/LICENSE)
[![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)
[![Documentation](https://img.shields.io/badge/docs-mkdocs-blue.svg)](https://jacobhuang91.github.io/prompt-refiner/)
[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Xinghao91/prompt-refiner)

</div>

> 🧹 **A lightweight Python library for optimizing and cleaning LLM inputs.**
> **Save 10-20% on API costs** by removing invisible tokens, stripping HTML, and redacting PII.

<div align="center">

⭐ **If you find this useful, please star us on GitHub!** ⭐

</div>

---

### 🎯 Perfect for:

**RAG Applications** • **Chatbots** • **Document Processing** • **Production LLM Apps** • **Cost Optimization**

---

## Why use Prompt Refiner?

**Two core problems solved:**

### 1. 🧹 Save 10-20% on API Costs - Clean & Optimize Prompts

Stop paying for invisible tokens and dirty data.

```python
from prompt_refiner import StripHTML, NormalizeWhitespace

# Before: "<div>  User    input\n\n\n  here  </div>" (150 tokens)
# After: "User input here" (85 tokens) → 43% savings
cleaned = (StripHTML() | NormalizeWhitespace()).run(dirty_input)
```

### 2. 🤖 Build Smart Chatbots - Manage Context Windows

Pack system prompts, RAG docs, and chat history into your token budget. Auto-clean HTML on-the-fly.

**For Chat APIs (OpenAI, Anthropic):**

```python
from prompt_refiner import MessagesPacker, ROLE_SYSTEM, ROLE_CONTEXT, ROLE_QUERY, ROLE_USER, StripHTML

packer = MessagesPacker(max_tokens=1000)
packer.add("You are helpful.", role=ROLE_SYSTEM)
packer.add("<p>RAG doc with HTML...</p>", role=ROLE_CONTEXT, refine_with=StripHTML())
packer.add("Old chat msg...", role=ROLE_USER)
packer.add("User question?", role=ROLE_QUERY)

messages = packer.pack()
# Output: List[Dict] ready for openai.chat.completions.create()
# [
#   {"role": "system", "content": "You are helpful."},
#   {"role": "user", "content": "User question?"},
#   {"role": "context", "content": "RAG doc with HTML..."}  # HTML cleaned!
# ]
# Note: "Old chat msg..." dropped to fit budget
```

**For Completion APIs (Llama, GPT-3):**

```python
from prompt_refiner import TextPacker, TextFormat, ROLE_SYSTEM, ROLE_CONTEXT, ROLE_QUERY, ROLE_USER, ROLE_ASSISTANT, StripHTML

packer = TextPacker(max_tokens=500, text_format=TextFormat.MARKDOWN)
packer.add("You are a QA assistant.", role=ROLE_SYSTEM)
packer.add("<div>RAG doc...</div>", role=ROLE_CONTEXT, refine_with=StripHTML())
packer.add("What is X?", role=ROLE_USER)
packer.add("X is a library.", role=ROLE_ASSISTANT)
packer.add("How to install?", role=ROLE_USER)
packer.add("Use pip install.", role=ROLE_ASSISTANT)
packer.add("What is this?", role=ROLE_QUERY)

prompt = packer.pack()
# Output: str ready for completion APIs
# ### INSTRUCTIONS:
# You are a QA assistant.
#
# ### CONTEXT:
# - RAG doc...
#
# ### CONVERSATION:
# - What is X?
# - X is a library.
#
# ### INPUT:
# What is this?
#
# Note: Last 2 history messages dropped to fit budget (auto-prioritized)
```

### ✨ Key Features

**Token Optimization:**
- **🧹 Clean Dirty Data** - Strip HTML, normalize whitespace, fix Unicode, redact PII
- **📉 Reduce Costs** - Save 10-20% on API costs by removing unnecessary tokens
- **📦 Pipe Syntax** - Compose operations like LEGO blocks: `StripHTML() | NormalizeWhitespace()`

**Context Management:**
- **🤖 Smart Packers** - MessagesPacker for chat APIs, TextPacker for completion APIs
- **🎯 Priority-Based** - Auto-prioritizes: system > query > context > history
- **✂️ Budget Control** - Fits content within token limits, drops low-priority items
- **🔄 JIT Cleaning** - Clean RAG docs on-the-fly with `refine_with=StripHTML()`

**Developer Experience:**
- **🪶 Zero Dependencies** - Lightweight core, optional tiktoken for precise counting
- **⚡ Blazing Fast** - < 0.5ms per 1k tokens overhead
- **🎯 Type Safe** - Full type hints for better IDE support
- **🚀 Production Ready** - Battle-tested with comprehensive test coverage

## Installation

```bash
# Basic installation (lightweight, zero dependencies)
pip install llm-prompt-refiner

# With precise token counting (optional, installs tiktoken)
pip install llm-prompt-refiner[token]
```

**Installation Modes:**
- **Default (Lightweight)**: Zero dependencies, uses character-based token estimation
- **Precise Mode**: Installs `tiktoken` for accurate token counting with no safety buffer. Pass a `model` parameter to CountTokens or MessagesPacker/TextPacker to enable.

## 🚀 Quick Start with Preset Strategies

**Don't want to manually configure operations?** Use our benchmark-tested preset strategies:

```python
from prompt_refiner.strategy import MinimalStrategy, AggressiveStrategy

# Minimal: 4.3% reduction, 98.7% quality
refiner = MinimalStrategy().create_refiner()
cleaned = refiner.run("<div>Your HTML content</div>")

# Aggressive: 15% reduction, 96.4% quality
refiner = AggressiveStrategy(max_tokens=150).create_refiner()
cleaned = refiner.run(long_context)
```

**Choose your strategy:**

| Strategy | Token Reduction | Quality | Use Case |
|----------|----------------|---------|----------|
| **Minimal** | 4.3% | 98.7% | Maximum quality, minimal risk |
| **Standard** | 4.8% | 98.4% | RAG contexts with duplicates |
| **Aggressive** | 15% | 96.4% | Cost optimization, long contexts |

**Usage patterns:**

```python
# Basic usage
from prompt_refiner.strategy import MinimalStrategy
refiner = MinimalStrategy().create_refiner()

# With custom parameters
from prompt_refiner.strategy import AggressiveStrategy
refiner = AggressiveStrategy(max_tokens=200).create_refiner()

# Extend with additional operations
refiner = MinimalStrategy().create_refiner()
refiner.pipe(RedactPII(redact_types={"email"}))
```

> 📖 **See full examples:** [examples/strategy/](examples/strategy/)

## 📊 Proven Effectiveness

We benchmarked Prompt Refiner on 30 real-world test cases (SQuAD + RAG scenarios) to measure token reduction and response quality:

<div align="center">

| Strategy | Token Reduction | Quality (Cosine) | Judge Approval | Overall Equivalent |
|----------|----------------|------------------|----------------|--------------------|
| **Minimal** | 4.3% | 0.987 | 86.7% | 86.7% |
| **Standard** | 4.8% | 0.984 | 90.0% | 86.7% |
| **Aggressive** | **15.0%** | 0.964 | 80.0% | 66.7% |

</div>

**Key Insights:**
- **Aggressive strategy achieves 3x more savings (15%) vs Minimal** while maintaining 96.4% quality
- Individual RAG tests showed **17-74% token savings** with aggressive strategy
- **Deduplicate** (Standard) shows minimal gains on typical RAG contexts
- **TruncateTokens** (Aggressive) provides the largest cost reduction for long contexts
- **Trade-off**: More aggressive = more savings but slightly lower judge approval

**Example: RAG with duplicates**
- Minimal (HTML + Whitespace): 17% reduction
- Standard (+ Deduplicate): 31% reduction
- **Aggressive (+ Truncate 150 tokens): 49% reduction** 🎉

<div align="center">

![Token Reduction vs Quality](benchmark/custom/results/benchmark_results.png)

</div>

> 💰 **Cost Savings:** At scale (1M tokens/month), 15% reduction saves **~$54/month** on GPT-4 input tokens.
>
> 📖 **See full benchmark:** [benchmark/custom/README.md](benchmark/custom/README.md)

## ⚡ Performance & Latency

**"What's the latency overhead?"** - Negligible. Prompt Refiner adds **< 0.5ms per 1k tokens** of overhead.

<div align="center">

| Strategy | @ 1k tokens | @ 10k tokens | @ 50k tokens | Overhead per 1k tokens |
|----------|------------|--------------|--------------|------------------------|
| **Minimal** (HTML + Whitespace) | 0.05ms | 0.48ms | 2.39ms | **0.05ms** |
| **Standard** (+ Deduplicate) | 0.26ms | 2.47ms | 12.27ms | **0.25ms** |
| **Aggressive** (+ Truncate) | 0.26ms | 2.46ms | 12.38ms | **0.25ms** |

</div>

**Key Insights:**
- ⚡ **Minimal strategy**: Only 0.05ms per 1k tokens (faster than a network packet)
- 🎯 **Standard strategy**: 0.25ms per 1k tokens - adds ~2.5ms to a 10k token prompt
- 📊 **Context**: Network + LLM TTFT is typically 600ms+, refining adds < 0.5% overhead
- 🚀 **Individual operations** (HTML, whitespace) are < 0.5ms per 1k tokens

**Real-world impact:**
```
10k token RAG context refining: ~2.5ms overhead
Network latency: ~100ms
LLM Processing (TTFT): ~500ms+
Total overhead: < 0.5% of request time
```

> 🔬 **Run yourself:** `python benchmark/latency/benchmark.py` (no API keys needed)

## 🎮 Interactive Demo

Try prompt-refiner in your browser - no installation required!

<div align="center">

[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/Xinghao91/prompt-refiner)

**[🚀 Launch Interactive Demo →](https://huggingface.co/spaces/Xinghao91/prompt-refiner)**

</div>

Play with different strategies, see real-time token savings, and find the perfect configuration for your use case. Features:

- 🎯 6 preset examples (e-commerce, support tickets, docs, RAG, etc.)
- ⚡ Quick strategy presets (Minimal, Standard, Aggressive)
- 💰 Real-time cost savings calculator
- 🔧 All 7 operations configurable
- 📊 Visual metrics dashboard

## 4 Core Modules

Prompt Refiner is organized into 4 specialized transformation modules:

### 1. **Cleaner** - Clean Dirty Data
- `StripHTML()` - Remove HTML tags, convert to Markdown
- `NormalizeWhitespace()` - Collapse excessive whitespace
- `FixUnicode()` - Remove zero-width spaces and problematic Unicode
- `JsonCleaner()` - Strip nulls/empties from JSON, minify (great for RAG APIs)

### 2. **Compressor** - Reduce Size
- `TruncateTokens()` - Smart truncation with sentence boundaries
  - Strategies: `"head"`, `"tail"`, `"middle_out"`
- `Deduplicate()` - Remove similar content (great for RAG)

### 3. **Scrubber** - Security & Privacy
- `RedactPII()` - Automatically redact emails, phones, IPs, credit cards, URLs, SSNs

### 4. **Packer** - Context Budget Management
- **`MessagesPacker`** - For chat APIs (OpenAI, Anthropic). Returns `List[Dict]`
- **`TextPacker`** - For completion APIs (Llama Base, GPT-3). Returns `str`
- **Semantic roles** - Use `ROLE_SYSTEM`, `ROLE_QUERY`, `ROLE_CONTEXT` (auto-prioritized)
- **JIT refinement** - Clean documents on-the-fly with `refine_with=StripHTML()`
- **Priority-based selection** - Automatically drops low-priority items when over budget

## Measurement & Analysis

Track and measure your optimization impact:

- **`CountTokens()`** - Calculate token savings and ROI for refiner pipelines
  - **Estimation mode** (default): Character-based approximation (1 token ≈ 4 chars)
  - **Precise mode** (with tiktoken): Exact token counts using OpenAI's tokenizer

## Complete Example

```python
from prompt_refiner import (
    # Core Modules
    StripHTML, NormalizeWhitespace, FixUnicode, JsonCleaner,  # Cleaner
    Deduplicate, TruncateTokens,  # Compressor
    RedactPII,  # Scrubber
    MessagesPacker, ROLE_SYSTEM, ROLE_QUERY, ROLE_CONTEXT,  # Packer
    # Measurement
    CountTokens,
)

# Example 1: Clean and optimize text
original_text = """<div>Your messy input here...</div>"""

counter = CountTokens(original_text=original_text)

# Build pipeline with all modules
pipeline = (
    StripHTML(to_markdown=True)
    | NormalizeWhitespace()
    | FixUnicode()
    | Deduplicate(similarity_threshold=0.85)
    | TruncateTokens(max_tokens=500, strategy="head")
    | RedactPII(redact_types={"email", "phone"})
)

result = pipeline.run(original_text)
counter.process(result)
print(counter.format_stats())

# Example 2: Pack messages for chat API with JIT refinement
packer = MessagesPacker(max_tokens=1000)
packer.add("You are helpful.", role=ROLE_SYSTEM)  # Auto: PRIORITY_SYSTEM (0)

# Clean RAG documents on-the-fly (single operation)
packer.add(
    "<div>RAG doc with HTML...</div>",
    role=ROLE_CONTEXT,  # Auto: PRIORITY_HIGH (20)
    refine_with=StripHTML()
)

# Chain multiple cleaning operations for dirty documents
packer.add(
    "<p>  Another   doc with  HTML   and   whitespace  </p>",
    role=ROLE_CONTEXT,  # Auto: PRIORITY_HIGH (20)
    refine_with=[StripHTML(), NormalizeWhitespace()]
)

packer.add("User question", role=ROLE_QUERY)  # Auto: PRIORITY_QUERY (10)

messages = packer.pack()  # Returns List[Dict]
# Use with: openai.chat.completions.create(messages=messages)
```

## Examples

Check out the [`examples/`](examples/) folder for detailed examples:

**Quick Start:**
- `strategy/` - Preset strategies (Minimal, Standard, Aggressive) for easy token optimization

**Core Modules:**
- `cleaner/` - HTML cleaning, JSON compression, whitespace normalization, Unicode fixing
- `compressor/` - Smart truncation, deduplication
- `scrubber/` - PII redaction
- `packer/` - Context budget management with priorities + real API integration (OpenAI)

**Measurement:**
- `analyzer/` - Token counting and cost savings analysis

## Development

This project uses `uv` for dependency management and `make` for common tasks.

```bash
# Install dependencies
make install

# Run tests
make test

# Format code
make format
```

## Star History

<div align="center">

[![Star History Chart](https://api.star-history.com/svg?repos=JacobHuang91/prompt-refiner&type=Date)](https://star-history.com/#JacobHuang91/prompt-refiner&Date)

</div>

## License

MIT