Metadata-Version: 2.4
Name: compresr
Version: 2.6.4
Summary: Python SDK for Compresr - Intelligent prompt compression service
Author-email: Compresr Team <founders@compresr.ai>
License-Expression: Apache-2.0
Project-URL: Homepage, https://compresr.ai
Project-URL: Documentation, https://compresr.ai/docs
Project-URL: Repository, https://github.com/Compresr-ai/Compresr-SDK
Project-URL: Issues, https://github.com/Compresr-ai/Compresr-SDK/issues
Keywords: llm,compression,ai,openai,gpt,tokens,cost-optimization
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: httpx>=0.27.0
Requires-Dist: pydantic>=2.10.0
Requires-Dist: langchain>=1.0
Requires-Dist: langchain-core>=0.3
Requires-Dist: langchain-anthropic>=0.3
Requires-Dist: langchain-openai>=0.3
Requires-Dist: langchain-google-genai>=4.0
Requires-Dist: langchain-tavily>=0.1
Requires-Dist: langchain-community>=0.3
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-timeout>=2.1.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: python-dotenv>=1.0.0; extra == "dev"
Provides-Extra: langgraph
Requires-Dist: langgraph>=0.2; extra == "langgraph"
Provides-Extra: llamaindex
Requires-Dist: llama-index-core>=0.11; extra == "llamaindex"
Provides-Extra: litellm
Requires-Dist: litellm[proxy]>=1.79; extra == "litellm"
Provides-Extra: all
Requires-Dist: langgraph>=0.2; extra == "all"
Requires-Dist: llama-index-core>=0.11; extra == "all"
Requires-Dist: litellm[proxy]>=1.79; extra == "all"
Provides-Extra: langchain
Provides-Extra: agents
Provides-Extra: agents-anthropic
Provides-Extra: agents-openai
Provides-Extra: agents-gemini
Provides-Extra: agents-tavily
Provides-Extra: agents-brave
Provides-Extra: agents-all
Dynamic: license-file

# Compresr Python SDK

Query-aware LLM context compression — reduce LLM API costs by 30-70%.

## Install

```bash
pip install compresr
```

Get an API key at [compresr.ai](https://compresr.ai) → Dashboard → API Keys.

## Quick start

```python
from compresr import CompressionClient

client = CompressionClient(api_key="cmp_your_api_key")

result = client.compress(
    context="Long passage to compress...",
    query="What is the main conclusion?",
    target_compression_ratio=0.5,
)

print(f"Original:   {result.data.original_tokens} tokens")
print(f"Compressed: {result.data.compressed_tokens} tokens")
print(f"Saved:      {result.data.tokens_saved} tokens")
print(result.data.compressed_context)
```

The default model is `latte_v1` (query-aware). Pass any other model name your
account has access to via `compression_model_name="..."` — the backend
validates.

## Batch

Compress up to 100 contexts in one call. Pass a single query (applied to all)
or a list of one query per context:

```python
batch = client.compress_batch(
    contexts=["Doc 1...", "Doc 2...", "Doc 3..."],
    queries="What is self-attention?",
    target_compression_ratio=0.5,
)

print(f"Total saved: {batch.data.total_tokens_saved} tokens")
```

## Async + streaming

```python
result = await client.compress_async(context="...", query="...")

for chunk in client.compress_stream(context="...", query="..."):
    print(chunk.content, end="")
```

## LLM-agnostic agent client

One `CompressionClient`, three provider-shape facades, one engine. Construct
the client with `llm=` and you get an agent surface where **every tool output
is compressed automatically** before the LLM sees it.

```python
import os
from compresr import CompressionClient, WebSearchTool

client = CompressionClient(
    api_key=os.environ["COMPRESR_API_KEY"],
    llm="anthropic",                        # or "openai", "google_genai"
    llm_api_key=os.environ["ANTHROPIC_API_KEY"],
    compression={"target_compression_ratio": 0.5, "min_tokens": 300},
)
```

Use `llm="anthropic:claude-haiku-4-5"` if you want a default — but the
call-site `model=` always wins.

Three equivalent surfaces sit on the same client — the model lives at
the call site, just like Anthropic's and OpenAI's own SDKs:

```python
# Anthropic shape
client.messages.create(model="claude-haiku-4-5", max_tokens=512,
                       messages=[...], tools=[...])

# OpenAI shape
client.chat.completions.create(model="gpt-4o-mini", messages=[...], tools=[...])

# Native — returns a NormalizedResult
client.run(prompt="...", model="claude-haiku-4-5", tools=[...], max_tokens=512)
```

Behind all three sits LangChain 1.0's `create_agent` + `CompresrToolMiddleware`.
Tool outputs above `min_tokens` flow through `client.compress(...)` first.

### Built-in web search

```python
search = WebSearchTool.tavily(
    api_key=os.environ["TAVILY_API_KEY"],
    max_results=5,
    allowed_domains=["nytimes.com", "reuters.com"],   # optional
)
# Brave: WebSearchTool.brave(api_key=..., max_results=5)
```

### Bring your own tool

Any `@tool`-decorated function works — its string output is compressed for you:

```python
from langchain_core.tools import tool

@tool
def kb_lookup(topic: str) -> str:
    """Look up the internal policy on the given topic."""
    return INTERNAL_KB.get(topic, "Not found.")

client.messages.create(model="claude-haiku-4-5", max_tokens=256,
                       messages=[{"role": "user", "content": "Refund policy?"}],
                       tools=[kb_lookup])
```

Switch providers with one line: `llm="openai"` instead of
`llm="anthropic"` (then pass the model at the call site). Tools and
code are unchanged.

### Per-call LLM knobs

Pass `temperature`, `top_p`, `max_tokens`, `stop_sequences`,
`presence_penalty`, `frequency_penalty`, `seed`, etc. to any facade — they're
forwarded to the underlying chat model via `.bind(...)` per call, so the
cached chat model is never mutated:

```python
client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=512,
    temperature=0.2,
    top_p=0.9,
    messages=[...],
)
```

Gemini's `max_output_tokens` is aliased automatically when targeting
`llm="google_genai:..."`.

**Why not provider-native server search?** Anthropic's `web_search_20250305`,
OpenAI's `web_search_preview`, and Gemini's `google_search` run server-side
and return encrypted/opaque content that Compresr cannot read or compress.
Use Tavily or Brave so the result is plaintext we can compress.

## Compression options

| Param | Purpose |
|---|---|
| `query` | Question the LLM is trying to answer — drives `latte_v1` compression |
| `target_compression_ratio` | `0-1` strength (e.g. `0.5` = remove 50%) or `>1` for Nx factor (`4` = 4x). Backend max: 200 |
| `coarse` | `True` for paragraph-level (default, faster), `False` for token-level (fine-grained) |
| `heuristic_chunking` | Structure-preserving chunking |
| `disable_placeholders` | Disable placeholder tokens in output |

## Error handling

```python
from compresr.exceptions import (
    CompresrError,
    AuthenticationError,
    RateLimitError,
    ValidationError,
)

try:
    result = client.compress(context="...", query="...")
except AuthenticationError:
    print("Invalid API key")
except RateLimitError:
    print("Rate limit exceeded")
except ValidationError as e:
    print(f"Invalid request: {e}")
except CompresrError as e:
    print(f"API error: {e}")
```

## Framework integrations

The agents layer ships in the base install — `pip install compresr` is enough to get `CompressionClient`, all three provider chat models (Anthropic / OpenAI / Gemini), and both web search tools (Tavily + Brave).

Genuinely optional integrations beyond the agents layer:

| Extra | Pulls in |
|---|---|
| `compresr[langgraph]` | `langgraph` (LangGraph checkpoint serializer, store, handoff tool) |
| `compresr[llamaindex]` | `llama-index-core` (node postprocessor, memory block, tool wrapper) |
| `compresr[litellm]` | `litellm[proxy]` (LiteLLM proxy guardrail) |
| `compresr[all]` | all three above |

```bash
pip install "compresr[langgraph]"
```

Old `compresr[agents]` / `compresr[agents-anthropic]` / `compresr[agents-all]` / `compresr[langchain]` install commands still resolve (no-op extras kept for back-compat) — everything they used to pull in is now in the base install.

### LangChain — middleware + tool wrapper + retriever

```python
from langchain.agents import create_agent
from compresr.integrations.langchain import (
    CompresrToolMiddleware,
    wrap_tool_with_compression,
    CompresrExtractor,
)

agent = create_agent(
    model=model,
    tools=[web_search],
    middleware=[CompresrToolMiddleware(
        api_key=os.environ["COMPRESR_API_KEY"],
        query_arg="query",
    )],
)
```

### LangGraph — compression as a graph node

```python
from compresr.integrations.langgraph import make_compresr_node

graph.add_node("compress", make_compresr_node(
    api_key=os.environ["COMPRESR_API_KEY"],
    context_key="retrieved_text",
    query_key="user_question",
))
```

### LlamaIndex — node postprocessor for RAG

```python
from compresr.integrations.llamaindex import CompresrNodePostprocessor

query_engine = index.as_query_engine(
    node_postprocessors=[CompresrNodePostprocessor(
        api_key=os.environ["COMPRESR_API_KEY"],
    )],
)
```

### Unified query API

Every integration that accepts a query exposes the same three knobs:

| Param | Purpose |
|---|---|
| `query` | Static query — same for every call |
| `query_extractor` | Callable that derives the query from the call context |
| `query_arg` / `query_key` | Name of the tool arg / state key to use as the query |

Priority: `query` > `query_extractor` > `query_arg`/`query_key` > smart-pick
from common arg keys (`query`, `question`, `search_query`, ...) > last human
message in history.

### Tutorials

Runnable Jupyter notebooks under `tutorial/`:

- `01_quickstart.ipynb` — core `CompressionClient`.
- `02_langchain.ipynb` — middleware + tool wrapper + retriever.
- `03_langgraph.ipynb` — compression node in a 3-node graph.
- `04_llamaindex.ipynb` — node postprocessor + tool wrapper.
- `05_compresr_agents.ipynb` — agent client (Anthropic/OpenAI/native shapes) with auto-compressed tool output.

## Requirements

- Python 3.9+
- `httpx >= 0.27.0`
- `pydantic >= 2.10.0`
- Optional: `langchain>=1.0`, `langgraph>=0.2`, `llama-index-core>=0.11`
  (install the matching extra)

## License

Apache 2.0 — see [LICENSE](LICENSE).

## Support

- Docs: [compresr.ai/docs](https://compresr.ai/docs/overview)
- Issues: [GitHub](https://github.com/Compresr-ai/Compresr-SDK/issues)
- Email: support@compresr.ai
