Metadata-Version: 2.4
Name: tokenoptim
Version: 0.1.0
Summary: Reduce LLM costs by 10-30% through tokenizer-aware prompt compression
Project-URL: Homepage, https://github.com/tokenoptim/tokenoptim
Project-URL: Documentation, https://github.com/tokenoptim/tokenoptim#readme
Author: TokenOptim Contributors
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.11
Requires-Dist: aiosqlite>=0.19.0
Requires-Dist: fastapi>=0.104.0
Requires-Dist: tiktoken>=0.5.0
Requires-Dist: uvicorn[standard]>=0.24.0
Provides-Extra: all
Requires-Dist: anthropic>=0.18.0; extra == 'all'
Requires-Dist: openai>=1.0.0; extra == 'all'
Requires-Dist: sentencepiece>=0.1.99; extra == 'all'
Requires-Dist: transformers>=4.36.0; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.18.0; extra == 'anthropic'
Provides-Extra: dev
Requires-Dist: httpx>=0.25.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest>=7.4.0; extra == 'dev'
Provides-Extra: local
Requires-Dist: sentencepiece>=0.1.99; extra == 'local'
Requires-Dist: transformers>=4.36.0; extra == 'local'
Provides-Extra: openai
Requires-Dist: openai>=1.0.0; extra == 'openai'
Description-Content-Type: text/markdown

<p align="center">
  <img src="docs/logo-banner.png" alt="TokenOptim" width="380">
</p>

<p align="center">
  Reduce LLM costs by 10-30% through tokenizer-aware prompt compression. Works with any LLM provider.
</p>

<p align="center">
  <a href="https://pypi.org/project/tokenoptim/"><img src="https://img.shields.io/pypi/v/tokenoptim?color=blue" alt="PyPI Version"></a>
  <a href="https://pypi.org/project/tokenoptim/"><img src="https://img.shields.io/pypi/pyversions/tokenoptim" alt="Python Versions"></a>
  <a href="https://github.com/lucamocerino/TokenOptim/blob/main/LICENSE"><img src="https://img.shields.io/github/license/lucamocerino/TokenOptim" alt="License"></a>
  <a href="https://github.com/lucamocerino/TokenOptim/stargazers"><img src="https://img.shields.io/github/stars/lucamocerino/TokenOptim" alt="GitHub Stars"></a>
  <a href="https://github.com/lucamocerino/TokenOptim/issues"><img src="https://img.shields.io/github/issues/lucamocerino/TokenOptim" alt="Issues"></a>
</p>

> If you find TokenOptim useful, consider giving it a star on GitHub — it helps others discover the project and motivates continued development.

## Table of Contents

- [Quick Start](#quick-start)
- [Compression Examples](#compression-examples)
- [How It Works](#how-it-works)
- [Dashboard](#dashboard)
- [Configuration](#configuration)
- [Advisor Utilities](#advisor-utilities)
- [Supported Models](#supported-models)
- [Installation Options](#installation-options)
- [Project Structure](#project-structure)
- [Development](#development)
- [Limitations](#limitations)
- [Contributing](#contributing)
- [License](#license)

## Quick Start

```bash
pip install tokenoptim
```

### Compress a prompt

```python
import tokenoptim

result = tokenoptim.optimize("your long prompt here", model="gpt-4")
print(result.text)           # compressed text
print(result.savings_pct)    # e.g. 28.0
print(result.cost_saved_usd) # e.g. 0.000630
```

### Use with any provider

```python
import tokenoptim
from openai import OpenAI

client = OpenAI()
result = tokenoptim.optimize("your long prompt here", model="gpt-4")

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": result.text}],
)
```

Works the same way with Anthropic, DeepSeek, Mistral, Google, or any other provider:

```python
import tokenoptim
from anthropic import Anthropic

client = Anthropic()
result = tokenoptim.optimize("your long prompt here", model="claude-sonnet-4")

response = client.messages.create(
    model="claude-sonnet-4",
    max_tokens=1024,
    messages=[{"role": "user", "content": result.text}],
)
```

### Compress chat messages

```python
import tokenoptim

optimized = tokenoptim.optimize_messages([
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in detail."},
], model="gpt-4")

# Use optimized.messages directly in your API call
response = client.chat.completions.create(
    model="gpt-4",
    messages=optimized.messages,
)
print(f"Saved {optimized.tokens_saved} tokens ({optimized.savings_pct}%)")
```

### Track cumulative savings

```python
import tokenoptim

with tokenoptim.session(model="gpt-4") as s:
    s.optimize("first prompt")
    s.optimize("second prompt")
    s.optimize_messages([{"role": "user", "content": "third prompt"}])

print(f"Saved {s.total_tokens_saved} tokens across {s.call_count} calls")
print(f"Cost saved: ${s.total_cost_saved_usd:.4f}")
print(f"Avg savings: {s.avg_savings_pct}%")
```

### API Reference

| Function | Description |
|----------|-------------|
| `tokenoptim.optimize(text, model=, ...)` | Compress a string. Returns `CompressionResult` |
| `tokenoptim.optimize_messages(messages, model=, ...)` | Compress chat messages. Returns `MessagesResult` |
| `tokenoptim.session(model=, ...)` | Context manager tracking cumulative stats |
| `tokenoptim.suggest_cache_split(text, model=)` | Suggest prefix-caching split point. Returns `CacheSplitResult` |
| `tokenoptim.suggest_output_format(text)` | Detect verbose output patterns. Returns `list[OutputFormatSuggestion]` |
| `tokenoptim.compare_models(text, models=)` | Compare token counts and costs. Returns `list[ModelCostComparison]` |

Options available on all functions:

| Parameter | Default | Description |
|-----------|---------|-------------|
| `model` | `"gpt-4"` | Target model (for tokenizer and cost calculation) |
| `enable_contractions` | `True` | Apply contractions (`do not` -> `don't`) |
| `enable_filler_removal` | `True` | Strip filler phrases |
| `enable_phrase_shortening` | `True` | Replace verbose phrases (`due to the fact that` -> `because`) |
| `enable_numeric_normalization` | `True` | Normalize numbers (`1,000` -> `1000`, `3.00` -> `3`) |
| `enable_separator_removal` | `True` | Remove separator lines (`---`, `===`) and boilerplate phrases |
| `enable_html_stripping` | `False` | Strip HTML/XML tags and unescape entities |
| `enable_code_comment_stripping` | `False` | Strip `# ...` and `// ...` end-of-line comments |
| `enable_json_minification` | `False` | Minify JSON blocks and inline objects |
| `enable_duplicate_removal` | `False` | Remove consecutive duplicate lines/paragraphs |
| `enable_abbreviations` | `False` | Replace common long words (`configuration` -> `config`) |
| `enable_markdown_stripping` | `False` | Strip markdown formatting (preserves code blocks) |
| `enable_semantic_dedup` | `False` | Remove near-duplicate sentences via TF-IDF similarity |
| `semantic_dedup_threshold` | `0.8` | Cosine similarity threshold for semantic dedup |
| `enable_indentation_compaction` | `False` | Reduce 4-space/tab indentation to 2-space |
| `enable_url_shortening` | `False` | Replace URLs with domain only (preserves code blocks) |
| `enable_article_trimming` | `False` | Remove redundant articles after prepositions/verbs |
| `enable_list_compaction` | `False` | Convert short bullet/numbered lists to comma-separated |
| `enable_xml_minification` | `False` | Minify XML in fenced blocks and inline |
| `enable_yaml_minification` | `False` | Minify YAML in fenced blocks (strip comments, reduce indent) |
| `track` | `True` | Log metrics to the dashboard database |

### Lower-level Compressor class

For direct control without metrics tracking:

```python
from tokenoptim import Compressor

c = Compressor(model="gpt-4")
result = c.compress("your prompt text here")

print(f"Original:   {result.original_tokens} tokens")
print(f"Compressed: {result.compressed_tokens} tokens")
print(f"Saved:      {result.savings_pct}%")
print(f"Cost saved: ${result.cost_saved_usd:.6f}")
```

## Compression Examples

### System prompt with fillers and contractions

**Before** (38 tokens):
```
You are a helpful coding assistant. Please note that you should provide
concise and accurate code. It is important to mention that you should
not make up APIs. You should always include error handling.
```

**After** (29 tokens):
```
You're a helpful coding assistant. You should provide concise and accurate
code. You shouldn't make up APIs. You should always include error handling.
```

**Savings: 24%** — filler removal (`please note that`, `it is important to mention that`) and contractions (`You are` → `You're`, `should not` → `shouldn't`).

---

### Filler-heavy requirements prompt

**Before** (38 tokens):
```
It is important to note that I need a REST API. Please note that it should
handle authentication. It should be noted that rate limiting is required.
As previously mentioned we are using PostgreSQL.
```

**After** (21 tokens):
```
I need a REST API. It should handle authentication. Rate limiting is
required. We're using PostgreSQL.
```

**Savings: 45%** — four filler phrases stripped, plus contractions.

---

### Code-adjacent prompt

**Before** (24 tokens):
```
Write a Python function that does not raise an exception. It is important
to note that the function should return a list.
```

**After** (18 tokens):
```
Write a Python function that doesn't raise an exception. The function
should return a list.
```

**Savings: 25%** — contractions and filler removal apply; code structure is preserved.

---

### Unicode and whitespace cleanup

**Before** (35 tokens):
```
The model\u2019s predictions are   very   accurate.    We have not tested
the   edge cases yet,  but  we  should  not  skip  them.
```

**After** (24 tokens):
```
The model's predictions are very accurate. We've not tested the edge
cases yet, but we shouldn't skip them.
```

**Savings: 31%** — smart quotes normalized, extra whitespace collapsed, contractions applied.

---

### Chat messages (multi-message compression)

**Before** (61 tokens):
```python
messages = [
    {"role": "system", "content": "You are an expert Python developer. Please note that you should write clean code. You should not use global variables. You should always add type hints."},
    {"role": "user", "content": "It is important to note that I need a function to parse JSON. The function does not need to handle errors. It is worth noting that performance matters."},
]
```

**After** (47 tokens):
```python
messages = [
    {"role": "system", "content": "You're an expert Python developer. You should write clean code. You shouldn't use global variables. You should always add type hints."},
    {"role": "user", "content": "I need a function to parse JSON. The function doesn't need to handle errors. Performance matters."},
]
```

**Savings: 23%** — each message is compressed independently; fillers and contractions stack up across the conversation.

## How It Works

TokenOptim applies 24 compression strategies in order:

1. **Line ending normalization** — normalize `\r\n` and `\r` to `\n` (always on)
2. **Unicode normalization** — NFC normalize, replace smart quotes/dashes with ASCII
3. **Indentation compaction** — reduce 4-space/tab indentation to 2-space (opt-in)
4. **Whitespace normalization** — collapse multiple spaces, tabs, blank lines
5. **JSON minification** — minify fenced and inline JSON blocks (opt-in)
6. **XML minification** — minify fenced and inline XML (opt-in)
7. **YAML minification** — strip comments, reduce indent, remove blank lines in fenced YAML (opt-in)
8. **Redundant punctuation** — `!!!` → `!`, `????` → `?`
9. **HTML/XML stripping** — remove tags and unescape entities (opt-in)
10. **Markdown stripping** — remove formatting while preserving code blocks (opt-in)
11. **URL shortening** — replace full URLs with domain only, strip `www.` (opt-in)
12. **Filler removal** — strip phrases like "please note that", "basically", "it is important to mention that"
13. **Separator/boilerplate removal** — remove lines of `---`, `===`, etc. and phrases like "please find below"
14. **Duplicate line removal** — remove consecutive duplicate lines and paragraphs (opt-in)
15. **List compaction** — convert short bullet/numbered lists to comma-separated (opt-in)
16. **Verbose phrase shortening** — `due to the fact that` → `because`, `in order to` → `to`, `prior to` → `before`
17. **Abbreviations** — `configuration` → `config`, `documentation` → `docs`, `database` → `db` (opt-in)
18. **Article trimming** — remove redundant `the/a/an` after prepositions and common verbs (opt-in)
19. **Contractions** — `do not` → `don't`, `it is` → `it's` (configurable)
20. **Numeric normalization** — `1,000,000` → `1000000`, `3.00` → `3`, `007` → `7`
21. **Code comment stripping** — remove `# ...` and `// ...` end-of-line comments (opt-in)
22. **Semantic deduplication** — remove near-duplicate sentences using TF-IDF cosine similarity (opt-in)
23. **Trailing whitespace** — strip per-line trailing spaces
24. **Tokenizer-specific** — model-aware optimizations (e.g., `\n \n` → `\n\n` saves 2 tokens in tiktoken)

All strategies preserve semantic meaning. Code and structured data pass through with minimal changes.

## Dashboard

Launch the real-time savings dashboard:

```bash
tokenoptim dashboard
```

Open `http://localhost:8383` to see:

- Total tokens and cost saved
- Savings over time charts
- Per-model breakdown
- Recent requests log
- ROI calculator

![Dashboard overview](docs/screenshots/dashboard-overview.png)
![Requests and ROI calculator](docs/screenshots/dashboard-detail.png)

### Terminal stats

```bash
tokenoptim stats
```

## Configuration

```python
from tokenoptim import Compressor

# Disable contractions (for formal prompts)
c = Compressor(model="gpt-4", enable_contractions=False)

# Disable filler removal
c = Compressor(model="gpt-4", enable_filler_removal=False)

# Add custom filler phrases
c = Compressor(model="gpt-4", custom_fillers=["in my opinion", "to be honest"])

# Enable HTML stripping (opt-in — useful for web-scraped content)
c = Compressor(model="gpt-4", enable_html_stripping=True)

# Enable code comment stripping (opt-in — useful for code-heavy prompts)
c = Compressor(model="gpt-4", enable_code_comment_stripping=True)

# Disable verbose phrase shortening
c = Compressor(model="gpt-4", enable_phrase_shortening=False)

# Enable JSON minification (opt-in — useful for prompts with JSON data)
c = Compressor(model="gpt-4", enable_json_minification=True)

# Enable markdown stripping (opt-in — useful for web-scraped markdown)
c = Compressor(model="gpt-4", enable_markdown_stripping=True)

# Enable abbreviations (opt-in — replaces common long words)
c = Compressor(model="gpt-4", enable_abbreviations=True)

# Enable semantic deduplication (opt-in — removes near-duplicate sentences)
c = Compressor(model="gpt-4", enable_semantic_dedup=True, semantic_dedup_threshold=0.8)

# Enable indentation compaction (opt-in — reduces 4-space/tab to 2-space)
c = Compressor(model="gpt-4", enable_indentation_compaction=True)

# Enable URL shortening (opt-in — replaces URLs with domain only)
c = Compressor(model="gpt-4", enable_url_shortening=True)

# Enable article trimming (opt-in — removes redundant the/a/an)
c = Compressor(model="gpt-4", enable_article_trimming=True)

# Enable list compaction (opt-in — converts short lists to comma-separated)
c = Compressor(model="gpt-4", enable_list_compaction=True)

# Enable XML minification (opt-in — minifies XML in fenced blocks)
c = Compressor(model="gpt-4", enable_xml_minification=True)

# Enable YAML minification (opt-in — strips comments, reduces indent in YAML blocks)
c = Compressor(model="gpt-4", enable_yaml_minification=True)
```

### DeepSeek

```python
import tokenoptim
from openai import OpenAI

client = OpenAI(base_url="https://api.deepseek.com", api_key="your-key")
result = tokenoptim.optimize("your prompt here", model="deepseek-chat")

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": result.text}],
)
```

## Advisor Utilities

TokenOptim includes advisory functions that help you optimize LLM costs beyond compression:

### Suggest cache-friendly splits

```python
import tokenoptim

result = tokenoptim.suggest_cache_split("""
You are a helpful assistant specialized in Python.
Always provide working code examples.
Answer the user's question about: {topic}
""")

print(f"Static prefix: {result.static_tokens} tokens")
print(f"Dynamic suffix: {result.dynamic_tokens} tokens")
print(f"Cache savings estimate: {result.cache_savings_estimate:.0%}")
```

### Suggest concise output formats

```python
suggestions = tokenoptim.suggest_output_format(
    "Please explain in detail how neural networks work and provide a detailed analysis."
)
for s in suggestions:
    print(f"Pattern: '{s.current_pattern}' → {s.suggestion} (saves ~{s.estimated_savings_pct}%)")
```

### Compare model costs

```python
comparisons = tokenoptim.compare_models(
    "Your prompt text here",
    models=["gpt-4", "gpt-4o", "gpt-3.5-turbo", "claude-3-5-sonnet", "deepseek-chat"],
)
for c in comparisons:
    print(f"{c.model:20s} {c.tokens:5d} tokens  ${c.cost_per_call:.6f}/call  ({c.provider})")
```

## Supported Models

| Provider | Models | Tokenizer |
|----------|--------|-----------|
| OpenAI | gpt-5.2, gpt-5.2-pro, gpt-5.1, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4, gpt-4o, gpt-3.5-turbo, o3, o4-mini, o1 | tiktoken |
| Anthropic | claude-3-opus/sonnet/haiku, claude-3.5-*, claude-opus-4, claude-sonnet-4 | tiktoken (approx) |
| DeepSeek | deepseek-chat, deepseek-reasoner, deepseek-v3, deepseek-r1 | transformers / tiktoken fallback |
| Mistral | mistral-large, mistral-small, codestral, mixtral | tiktoken (approx) |
| Google | gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash | tiktoken (approx) |
| Meta | llama-3, llama-2 | tiktoken (approx) |
| Qwen | qwen-max, qwen-plus, qwen-turbo | tiktoken (approx) |
| Local | Any HuggingFace model | transformers / fallback |

## Installation Options

```bash
# Core (includes tiktoken for token counting)
pip install tokenoptim

# With local model support (HuggingFace transformers)
pip install tokenoptim[local]

# Development
pip install tokenoptim[dev]
```

## Project Structure

```
tokenoptim/
├── src/tokenoptim/
│   ├── compressor.py          # Core compression engine
│   ├── tokenizers.py          # Tokenizer registry & pricing
│   ├── metrics/               # Usage tracking (SQLite)
│   │   ├── collector.py
│   │   ├── models.py
│   │   └── db.py
│   ├── server/                # FastAPI dashboard backend
│   │   ├── app.py
│   │   └── routes.py
│   ├── api.py                 # Public Python API (optimize, session)
│   └── advisor.py             # Advisory utilities (cache split, model compare)
├── dashboard/                 # React dashboard (Vite + TailwindCSS)
└── tests/                     # pytest suite (198 tests)
```

## Development

```bash
# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run dashboard dev server (frontend)
cd dashboard && npm install && npm run dev

# Run API server (backend)
tokenoptim dashboard
```

## Limitations

- **English only** — contractions, filler removal, and verbose phrase shortening are designed for English text. Unicode normalization, whitespace cleanup, numeric normalization, and HTML/code comment stripping work with any language.
- **No semantic compression** — TokenOptim applies rule-based transformations only. It does not paraphrase, summarize, or use ML models.
- **Tokenizer approximation** — for providers without a public tokenizer (Anthropic, Mistral, Google, Meta, Qwen), token counts are approximated using tiktoken's cl100k_base encoding.

## Contributing

Found a bug or have an idea? [Open an issue](https://github.com/lucamocerino/TokenOptim/issues) or submit a PR. If TokenOptim saved you tokens (and money), a star goes a long way!

### Adding a new compression strategy

Want to contribute a new strategy? Here's how — it only takes 3 steps:

**1. Add your strategy to `src/tokenoptim/compressor.py`:**

```python
# Module-level data (if needed)
_MY_REPLACEMENTS: dict[str, str] = {
    "long phrase": "short",
}

# In the Compressor class:

# Add a constructor param (opt-in strategies default to False)
def __init__(self, ..., enable_my_strategy: bool = False):
    self.enable_my_strategy = enable_my_strategy

# Add a method
@staticmethod
def _apply_my_strategy(text: str) -> str:
    for old, new in _MY_REPLACEMENTS.items():
        text = text.replace(old, new)
    return text

# Wire it into the compress() pipeline at the right position
if self.enable_my_strategy:
    compressed = self._apply_my_strategy(compressed)
```

**2. Propagate the param through `src/tokenoptim/api.py`:**

Add `enable_my_strategy: bool = False` to `optimize()`, `optimize_messages()`, the `Session` dataclass, and `session()` — then pass it to the `Compressor` constructor in each.

**3. Add tests in `tests/test_compressor.py`:**

```python
class TestMyStrategy:
    def test_basic(self):
        c = Compressor(model="gpt-4", enable_my_strategy=True)
        result = c.compress("some long phrase here")
        assert "short" in result.text

    def test_disabled_by_default(self, compressor):
        result = compressor.compress("some long phrase here")
        assert "long phrase" in result.text
```

Run `PYTHONPATH=src python3 -m pytest tests/ -v` and make sure everything passes.

## License

MIT
