Metadata-Version: 2.4
Name: agent-ctx-compress
Version: 0.2.0
Summary: Smart context compression for LLM agents — preserve critical info, reduce tokens 40-80%
Author-email: Xbyteid <kucingwhite911@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/xbyteid/context-compressor
Project-URL: Repository, https://github.com/xbyteid/context-compressor
Project-URL: Issues, https://github.com/xbyteid/context-compressor/issues
Project-URL: Documentation, https://github.com/xbyteid/context-compressor#readme
Keywords: llm,agent,context,compression,tokens,ai,streaming,async
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: server
Requires-Dist: fastapi>=0.100; extra == "server"
Requires-Dist: uvicorn>=0.20; extra == "server"
Provides-Extra: tiktoken
Requires-Dist: tiktoken>=0.5.0; extra == "tiktoken"
Provides-Extra: openai
Requires-Dist: openai>=1.0.0; extra == "openai"
Provides-Extra: groq
Requires-Dist: groq>=0.4.0; extra == "groq"
Provides-Extra: all
Requires-Dist: fastapi>=0.100; extra == "all"
Requires-Dist: uvicorn>=0.20; extra == "all"
Requires-Dist: tiktoken>=0.5.0; extra == "all"
Requires-Dist: openai>=1.0.0; extra == "all"
Requires-Dist: groq>=0.4.0; extra == "all"
Dynamic: license-file

# Agent Context Compressor 🗜️

Smart context compression for LLM agents. Preserves critical information (decisions, errors, preferences, code) while reducing token usage by 40-80%.

## Problem

LLM context windows are expensive. 90% of agent conversations contain filler — greetings, acknowledgments, repeated tool outputs, verbose explanations. But naive truncation drops critical info like decisions, errors, and user preferences.

## Solution

Context Compressor scores every message by importance, then strategically drops low-value content while **guaranteeing** critical information is preserved.

```
Before: 603 tokens, 14 messages
After:  410 tokens,  5 messages (32% compressed, 98.7% confidence)
```

## Features

- 🧠 **Smart scoring** — classifies messages as decisions, errors, code, preferences, noise
- 🔒 **Critical preservation** — decisions, errors, preferences NEVER dropped
- 🔄 **Deduplication** — removes near-duplicate messages (Jaccard similarity)
- 📝 **Summarization** — long messages compressed instead of dropped
- 📊 **Confidence tracking** — know exactly how much info is preserved
- 🛠️ **Multiple interfaces** — Python library, CLI, REST API

## Installation

```bash
pip install agent-ctx-compress
```

With optional dependencies:

```bash
# For API server
pip install agent-ctx-compress[server]

# For accurate token counting
pip install agent-ctx-compress[tiktoken]

# For LLM summarization
pip install agent-ctx-compress[openai]

# Everything
pip install agent-ctx-compress[all]
```

## Quick Start

```python
from context_compressor import compress

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "yo"},
    {"role": "assistant", "content": "Hey! How can I help?"},
    {"role": "user", "content": "deploy the app to production"},
    {"role": "assistant", "content": "Deployed! Status: ✅ running"},
    {"role": "user", "content": "thanks"},
    {"role": "assistant", "content": "👍"},
]

result = compress(messages, target_ratio=0.3)
print(f"Saved {result.tokens_saved} tokens ({result.compression_ratio:.0%})")
print(f"Confidence: {result.confidence:.0%}")
print(result.compressed)
```

## CLI

```bash
# Install from PyPI
pip install agent-ctx-compress

# Compress from file
cat conversation.json | ctxcompress --ratio 0.3

# Compress with stats
ctxcompress --input chat.json --ratio 0.2 --format full

# Stats only
ctxcompress --input chat.json --stats-only

# JSON output
ctxcompress --input chat.json --json
```

## API Server

```bash
# Install with server deps
pip install agent-ctx-compress[server]

# Start server
ctxcompress-server
# → http://localhost:8000

# Compress via API
curl -X POST http://localhost:8000/compress \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "hello"},
      {"role": "assistant", "content": "Hi!"},
      {"role": "user", "content": "deploy app"},
      {"role": "assistant", "content": "Deployed successfully"}
    ],
    "target_ratio": 0.3
  }'
```

## How It Works

```
Input Messages
     │
     ▼
┌─────────────┐
│   Scorer    │ Classify: decision/error/code/preference/noise
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Deduplicat │ Remove near-duplicate messages
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Merge Tool │ Combine consecutive tool results
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Priority   │ Drop lowest-scored until target ratio
│  Drop       │ Preserve CRITICAL messages always
└──────┬──────┘
       │
       ▼
┌─────────────┐
│ Summarize   │ Compress long messages instead of dropping
└──────┬──────┘
       │
       ▼
Compressed Context + Metadata
```

## Scoring Categories

| Category | Importance | Droppable | Examples |
|----------|-----------|-----------|----------|
| system | CRITICAL | ❌ | System prompts |
| decision | CRITICAL | ❌ | "Let's use approach A" |
| error | CRITICAL | ❌ | Error messages, fixes |
| preference | CRITICAL | ❌ | "I prefer dark mode" |
| code | HIGH | ⚠️ | Code blocks, scripts |
| tool_result | HIGH | ⚠️ | API responses, outputs |
| user_query | HIGH | ⚠️ | User questions |
| structured_response | MEDIUM | ✅ | Lists, explanations |
| explanation | MEDIUM | ✅ | Long explanations |
| brief_response | LOW | ✅ | Short replies |
| noise | NOISE | ✅ | "yo", "sip", "👍" |

## Use Cases

1. **Agent context management** — Keep conversations within token limits
2. **Cost optimization** — Reduce API costs by 40-80%
3. **Session handoff** — Compress before switching models
4. **Memory systems** — Store compressed conversation summaries
5. **Multi-agent** — Share compressed context between agents

## License

MIT
