Metadata-Version: 2.4
Name: llmoptimize
Version: 3.2.2
Summary: Reduce LLM costs by 90% - AI recommendations with NO API keys needed!
Home-page: https://github.com/hackrudra1234/llmoptimize
Author: LLMOptimize Team
Author-email: hackrudra@gmail.com
License: Proprietary
Project-URL: Homepage, https://aioptimize.up.railway.app
Project-URL: Source, https://github.com/hackrudra1234/llmoptimize
Project-URL: Bug Reports, https://github.com/hackrudra1234/llmoptimize/issues
Keywords: ai,llm,cost,optimization,tracking,openai,anthropic,claude,gpt,groq,ml,recommendations,no-api-key,cost-reduction,savings
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: System :: Monitoring
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Operating System :: OS Independent
Classifier: Environment :: Console
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: license.txt
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Provides-Extra: server
Requires-Dist: fastapi>=0.104.1; extra == "server"
Requires-Dist: uvicorn[standard]>=0.24.0; extra == "server"
Requires-Dist: sqlalchemy>=2.0.23; extra == "server"
Requires-Dist: psycopg2-binary>=2.9.9; extra == "server"
Requires-Dist: pydantic>=2.0.0; extra == "server"
Provides-Extra: full
Requires-Dist: anthropic>=0.3.0; extra == "full"
Requires-Dist: groq>=0.4.0; extra == "full"
Requires-Dist: langchain>=0.1.0; extra == "full"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: project-url
Dynamic: provides-extra
Dynamic: requires-python
Dynamic: summary

﻿# LLMOptimize

> **Cut your AI API costs by 40–97% — automatically.**
> One import. Zero prompt changes. No infrastructure to run.

```bash
pip install llmoptimize
```

```python
import llmoptimize   # done — every AI call is now tracked and optimised
```

---

## Table of Contents

- [What It Does](#what-it-does)
- [Quick Start](#quick-start)
- [Installation](#installation)
- [Auto-Tracking](#auto-tracking-zero-code-changes)
- [The `@track_cost` Decorator](#the-track_cost-decorator)
- [Configuration](#configuration)
- [Guardrails](#guardrails)
- [CLI Audit Tool](#cli-audit-tool)
- [Dashboard & Reports](#dashboard--reports)
- [Supported Providers & Models](#supported-providers--models)
- [Privacy](#privacy)
- [Plans & Pricing](#plans--pricing)
- [FAQ](#faq)

---

## What It Does

LLMOptimize monitors every AI API call your application makes and tells you when a cheaper model would do the same job just as well.

```
Your Code  →  LLMOptimize SDK  →  Your AI Provider (OpenAI / Anthropic / ...)
                    │
                    ▼
            Recommendation Engine
            (hosted — nothing to run)
                    │
                    ▼
         "Use gpt-3.5-turbo instead.
          95% cheaper. Minimal quality
          impact. 90% confident."
```

The recommendation engine has three layers that run in order:

1. **Instant heuristics** — task-type detection using your prompt shape and keywords
2. **ML model** — trained on aggregated acceptance signals from all users (gets smarter over time)
3. **Pattern database** — crowd-sourced patterns from millions of real API calls

Everything runs on our servers. You install the SDK, we handle the rest.

---

## Quick Start

### Step 1 — Install

```bash
pip install llmoptimize
```

### Step 2 — Import before your AI library

```python
import llmoptimize          # one line — patches OpenAI, Anthropic, Groq automatically

import openai
client = openai.OpenAI()

# Your existing code — completely unchanged
response = client.chat.completions.create(
    model    = "gpt-4",
    messages = [{"role": "user", "content": "Classify this email as spam or not."}]
)
```

### Step 3 — See your savings

```python
llmoptimize.report()
```

```
╔════════════════════════════════════════════════════════════════════╗
║                      SMART RECOMMENDATION                        ║
╚════════════════════════════════════════════════════════════════════╝

🟢 Confidence: 90%

📊 You used:   gpt-4          →  $0.012400
✨ Switch to:  gpt-3.5-turbo  →  $0.000620

💰 YOU SAVE:   $0.011780  (95%)
📈 Quality impact: MINIMAL

💬 Why: Classification task — cheaper models maintain 95%+ accuracy
```

That's it. No server to run, no dashboard to set up, no config files.

---

## Installation

### Requirements

- Python 3.9 or higher
- At least one AI SDK: `openai`, `anthropic`, `groq`, `google-generativeai`, `mistralai`, or `cohere`

### Install

```bash
pip install llmoptimize
```

### Optional environment variables

| Variable | Default | What it does |
|---|---|---|
| `AIOPTIMIZE_SERVER_URL` | Managed cloud | Point to a dedicated instance (enterprise plans) |
| `AIOPTIMIZE_TIMEOUT` | `3` seconds | Max wait for a recommendation before proceeding |
| `AIOPTIMIZE_SHARE_DATA` | `true` | Opt out of anonymised metadata sharing |

---

## Auto-Tracking (Zero Code Changes)

`import llmoptimize` silently wraps every AI library you have installed. Your existing code, your existing response objects, your existing error handling — all untouched.

**Supported libraries:**

| Provider | Library | Chat | Embeddings | Async |
|---|---|---|---|---|
| OpenAI | `openai` | ✅ | ✅ | ✅ |
| Anthropic | `anthropic` | ✅ | — | ✅ |
| Groq | `groq` | ✅ | — | ✅ |
| Google | `google-generativeai` | ✅ | — | ✅ |
| Mistral | `mistralai` | ✅ | — | ✅ |
| Cohere | `cohere` | ✅ | ✅ | — |

**Guarantees:**
- Your API response is returned exactly as the provider sends it — nothing is modified
- If LLMOptimize encounters any internal error, it fails silently and your call goes through normally
- No added latency on the critical path — tracking and recommendations happen asynchronously

---

## The `@track_cost` Decorator

For more control, wrap specific functions directly.

### Basic tracking

```python
from llmoptimize import track_cost

@track_cost(model="gpt-4")
def classify_ticket(text: str):
    return client.chat.completions.create(
        model    = "gpt-4",
        messages = [{"role": "user", "content": text}]
    )
```

### Show recommendations before the call

```python
@track_cost(model="gpt-4", smart_suggestions=True)
def analyze_document(text: str):
    ...
```

### Auto-switch when confident

When `auto_optimize=True`, the SDK automatically uses the cheaper model when confidence is 90% or higher — no human needed in the loop:

```python
@track_cost(
    model             = "gpt-4",
    smart_suggestions = True,
    auto_optimize     = True,
)
def batch_classify(items: list):
    ...

# Console output:
# ✨ Auto-optimized: gpt-4 → gpt-3.5-turbo
#    Savings: $0.0114 (92%)  |  Confidence: 94%
```

### Full decorator options

```python
@track_cost(
    model              = "gpt-4",        # the model your code calls
    smart_suggestions  = False,          # show cheaper alternative before the call
    auto_optimize      = False,          # auto-switch at >= 90% confidence
    config             = None,           # AIOptimizeConfig for better recommendations
    enable_guardrails  = False,          # PII scanning + budget enforcement
    daily_budget       = None,           # float — block calls if daily spend exceeds this
    monthly_budget     = None,           # float — block calls if monthly spend exceeds this
)
```

Works identically on `async def` functions with no extra setup.

---

## Configuration

`AIOptimizeConfig` gives the recommendation engine context about your use case, which improves suggestion accuracy — especially for industry-specific quality tradeoffs.

```python
from llmoptimize import track_cost, AIOptimizeConfig

config = AIOptimizeConfig(
    user_id      = "your-company-id",    # anonymised before it leaves your machine
    industry     = "healthcare",         # tunes quality vs cost tradeoffs
    company_size = "startup",
    use_case     = "summarization",
    share_data   = True,                 # helps the model improve for everyone
)

@track_cost(model="gpt-4", smart_suggestions=True, config=config)
def my_function(prompt: str):
    ...
```

### Config options

| Field | Options | Effect |
|---|---|---|
| `industry` | `saas` `ecommerce` `healthcare` `finance` `legal` `education` `marketing` `engineering` `media` `other` | Adjusts quality sensitivity thresholds |
| `company_size` | `solo` `startup` `mid` `enterprise` | Influences cost vs reliability weighting |
| `use_case` | `customer_support` `rag` `content` `coding` `analytics` `translation` `summarization` `classification` `automation` `chatbot` `other` | Directly informs task-type detection |
| `share_data` | `True` / `False` | Whether to contribute anonymised usage to the shared ML model |

> `share_prompts` is always `False` regardless of what you pass. Prompt text never leaves your machine. See [Privacy](#privacy).

---

## Guardrails

### Security scanning

Enable guardrails to automatically scan every prompt for sensitive data before it reaches any AI provider.

```python
@track_cost(model="gpt-4", enable_guardrails=True)
def process_user_input(text: str):
    ...
```

**What gets detected:**

| Data type | Action |
|---|---|
| API keys (OpenAI, Anthropic, AWS, etc.) | 🔴 Call blocked |
| Private / cryptographic keys | 🔴 Call blocked |
| Credit card numbers | 🔴 Call blocked |
| Social Security Numbers | 🔴 Call blocked |
| Email addresses | 🟠 Warning shown |
| Phone numbers | 🟠 Warning shown |

When a critical issue is found the call never reaches your AI provider. A detailed report explains exactly what was detected and where.

### Budget enforcement

```python
@track_cost(
    model             = "gpt-4",
    enable_guardrails = True,
    daily_budget      = 10.00,
    monthly_budget    = 150.00,
)
def my_function(prompt: str):
    ...
```

When a call would push you over budget it is blocked before it's made:

```
❌ BLOCKED: Would exceed daily budget of $10.00
   Spent today:   $9.94
   Remaining:     $0.06
   Estimated cost of this call: $0.18
```

### Runaway loop protection

If more than 100 calls are detected within any 5-minute window, further calls are blocked automatically and you're alerted. This catches bugs — infinite retry loops, agent run-aways — before they cause a surprise bill.

---

## CLI Audit Tool

Scan any Python file to find AI cost optimisation opportunities without executing it.

```bash
llmoptimize audit mycode.py
```

```
╔════════════════════════════════════════════════════════════════════╗
║                    🤖 AI CODE AUDIT REPORT                       ║
╚════════════════════════════════════════════════════════════════════╝

📄 File: mycode.py

📊 ANALYSIS SUMMARY
────────────────────────────────────────────────────────────────────
Total API Calls:         7
Issues Found:            4
Models Used:             gpt-4, claude-3-opus-20240229

Est. Monthly Cost:       $342.00  (at 1,000 runs/month)
POTENTIAL SAVINGS:       $298.00  (87%)

🔍 DETAILED RECOMMENDATIONS

🔴 ISSUE #1: Line 42
   You're using:     claude-3-opus-20240229
   For:              Classifying support ticket urgency

   ✨ SWITCH TO:     claude-3-haiku-20240307
   Saves:            95%  |  Quality impact: MINIMAL  |  Confidence: 90%
```

### CLI commands

```bash
# Audit a file (AI-powered analysis, no API key needed)
llmoptimize audit myfile.py

# Rule-based only — completely free, no network call
llmoptimize audit myfile.py --no-ai

# Force fresh analysis (skip cache)
llmoptimize audit myfile.py --force

# One-line summary
llmoptimize audit myfile.py --quiet

# Cache management
llmoptimize stats
llmoptimize clear-cache
```

---

## Dashboard & Reports

### In-code report

```python
import llmoptimize

# ... your application code ...

llmoptimize.report()
```

Prints a full session breakdown:

```
════════════════════════════════════════════════════════════════════
📊 SESSION SUMMARY
════════════════════════════════════════════════════════════════════
Total Calls:      284
Total Cost:       $4.2180
Total Tokens:     421,800
Avg Cost/Call:    $0.014852
Duration:         0:18:42

MODEL BREAKDOWN:
────────────────────────────────────────────────────────────────────
gpt-4:
  Calls:   212     Cost: $3.8960     Tokens: 318,000
gpt-3.5-turbo:
  Calls:   72      Cost: $0.3220     Tokens: 103,800
════════════════════════════════════════════════════════════════════
```

### Manual tracking

For providers not auto-patched, or for tracking custom inference:

```python
llmoptimize.track(
    model             = "gpt-4",
    prompt_tokens     = 400,
    completion_tokens = 120,
    provider          = "openai",
)
```

---

## Supported Providers & Models

LLMOptimize has pricing data for 50+ models across all major providers. A selection:

**OpenAI**
`gpt-4o` · `gpt-4o-mini` · `gpt-4-turbo` · `gpt-4` · `gpt-3.5-turbo` · `o1` · `o1-mini` · `text-embedding-3-small` · `text-embedding-3-large`

**Anthropic**
`claude-3-5-sonnet-20241022` · `claude-3-5-haiku-20241022` · `claude-3-opus-20240229` · `claude-3-sonnet-20240229` · `claude-3-haiku-20240307`

**Groq**
`llama-3.3-70b-versatile` · `llama-3.1-70b-versatile` · `llama-3.1-8b-instant` · `gemma2-9b-it` · `mixtral-8x7b-32768`

**Google**
`gemini-1.5-pro` · `gemini-1.5-flash` · `gemini-1.0-pro`

**Mistral**
`mistral-large-latest` · `mistral-small-latest` · `open-mixtral-8x7b`

**Cohere**
`command-r-plus` · `command-r` · `command-light`

Pricing data is kept up to date on the server — the SDK always uses the latest figures without requiring an update.

---

## Privacy

LLMOptimize is built privacy-first. Here is exactly what goes where:

| Data | Stored locally | Sent to server |
|---|---|---|
| Prompt text | ❌ Never | ❌ Never |
| Prompt category (e.g. `"classification"`) | ✅ Yes | ✅ Yes |
| Token counts | ✅ Yes | ✅ Yes |
| Cost figures | ✅ Yes | ✅ Yes |
| Model names | ✅ Yes | ✅ Yes |
| Your `user_id` | SHA-256 hashed | First 16 chars of hash only |
| API keys | Detected in prompts, blocked | ❌ Never |

**The guarantee:** `share_prompts` is always `False`. The code enforces this — it cannot be overridden. Your prompt text is classified locally on your machine and only the resulting label (e.g. `"summarization"`) is ever transmitted.

**To opt out of all data sharing entirely:**

```bash
export AIOPTIMIZE_SHARE_DATA=false
```

Or in code:

```python
config = AIOptimizeConfig(user_id="me", share_data=False)
```

---

## Plans & Pricing

| | Free | Pro | Enterprise |
|---|---|---|---|
| Tracked calls / month | 10,000 | Unlimited | Unlimited |
| Recommendation engine | Heuristic | Heuristic + ML | Heuristic + ML + Custom |
| Code audit | 5 files / month | Unlimited | Unlimited |
| Guardrails | ✅ | ✅ | ✅ |
| Auto-optimize | ✅ | ✅ | ✅ |
| Dedicated server instance | ❌ | ❌ | ✅ |
| SSO / SAML | ❌ | ❌ | ✅ |
| SLA | — | 99.9% | 99.99% |
| Support | Community | Email | Dedicated Slack |
| Price | Free | $49 / month | Contact us |

[Get started free →](https://aioptimize.dev/signup) · [View full pricing →](https://aioptimize.dev/pricing)

---

## FAQ

**Does it work with streaming responses?**
Yes. The SDK intercepts the completed response after streaming finishes and records usage from the final usage block. Your streaming code is unaffected.

**Does it add latency to my API calls?**
No. Tracking and recommendation calls happen after your response is returned — they never sit on your critical path.

**What if the recommendation server is unreachable?**
The SDK falls back to local heuristics instantly and your API call proceeds normally. There is no scenario where an LLMOptimize failure blocks your application.

**Does `auto_optimize` change my prompt or my response?**
No. It only changes the `model` parameter on the API call. The prompt you wrote and the response you receive are identical — just generated by a cheaper model.

**Can I use this with a self-hosted or fine-tuned model?**
Use `llmoptimize.track()` to manually record calls to any endpoint. Recommendations won't be available for unknown models, but cost tracking will work.

**Is there a usage cap on the free tier?**
10,000 tracked calls per month. The SDK continues to work above this limit — recommendations are paused until the next billing cycle.

**Does it support LangChain / LlamaIndex?**
Yes. Both frameworks use the underlying OpenAI / Anthropic SDKs, which are patched automatically on import.

**Can I audit files in CI?**

```yaml
# .github/workflows/cost-check.yml
- name: AI cost audit
  run: llmoptimize audit src/ --quiet
```

The CLI exits with code `1` if critical issues are found, making it easy to fail a pipeline.

---

## Support

- **Docs:** [docs.aioptimize.dev](https://docs.aioptimize.dev)
- **Email:** support@aioptimize.dev
- **Status:** [status.aioptimize.dev](https://status.aioptimize.dev)

---

*LLMOptimize — spend less, build more.*
