Metadata-Version: 2.4
Name: chatcram
Version: 0.1.0
Summary: Token-aware chat-history compaction — summarize old turns, keep system + recent. Zero dependencies.
Project-URL: Homepage, https://github.com/Waelr1985/chatcram
Project-URL: Repository, https://github.com/Waelr1985/chatcram
Project-URL: Issues, https://github.com/Waelr1985/chatcram/issues
Project-URL: Changelog, https://github.com/Waelr1985/chatcram/blob/main/CHANGELOG.md
Author-email: Waelr1985 <waelr1985@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: agents,anthropic,chat-history,context-window,conversation,llm,memory,openai,summarization,token-budget,tokens
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.10
Provides-Extra: tiktoken
Requires-Dist: tiktoken>=0.5; extra == 'tiktoken'
Description-Content-Type: text/markdown

# chatcram

[![PyPI version](https://img.shields.io/pypi/v/chatcram.svg)](https://pypi.org/project/chatcram/)
[![Python versions](https://img.shields.io/pypi/pyversions/chatcram.svg)](https://pypi.org/project/chatcram/)
[![CI](https://github.com/Waelr1985/chatcram/actions/workflows/ci.yml/badge.svg)](https://github.com/Waelr1985/chatcram/actions/workflows/ci.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**Keep a long chat history within a token budget** — by summarizing the old
middle and keeping the system prompt + recent turns verbatim. Tiny,
zero-dependency, framework-agnostic. Bring your own summarizer.

As a conversation grows, you eventually blow past the context window. Dropping
old turns loses information; keeping everything is impossible. `chatcram`
collapses the older middle into a single summary while preserving what matters
most — the system prompt and the most recent turns.

```python
from chatcram import Compactor

# `summarize` is any callable you provide — usually an LLM call
compactor = Compactor(budget=4000, summarize=my_llm_summarizer, keep_recent=1500)

result = compactor.compact(messages)   # list of {"role", "content"} dicts

for m in result.messages:
    print(m["role"], "->", m["content"][:60])

print(result.summarized)    # True if the middle was collapsed
print(result.used_tokens)   # tokens in the compacted history
```

What you get back:

- **System messages** — always kept, verbatim, at the front.
- **A single summary message** — the older middle, collapsed via your summarizer.
- **Recent turns** — the latest `keep_recent` tokens, kept verbatim.

## Why

- **Zero dependencies.** Pure Python. A fast characters-per-token heuristic by
  default; plug in `tiktoken` or any tokenizer for exact counts.
- **Bring your own summarizer.** Any `str -> str` callable (an LLM call, a local
  model, anything). No provider lock-in, no hidden API calls.
- **Framework-agnostic.** Works on plain message dicts — not tied to LangChain
  or LlamaIndex.
- **Composes with [contextcram](https://github.com/Waelr1985/contextcram).**
  Compact the history, then pack it into a full prompt budget.

## Installation

```bash
pip install chatcram
# optional: exact token counts via tiktoken
pip install "chatcram[tiktoken]"
```

## How it works

```python
from chatcram import Compactor

def summarize(transcript: str) -> str:
    # call your LLM here; return a short summary string
    return my_client.complete(f"Summarize this conversation:\n{transcript}")

compactor = Compactor(
    budget=4000,          # if the history exceeds this, compact it
    summarize=summarize,
    keep_recent=1500,     # tokens of the most recent turns to keep verbatim
)

result = compactor.compact(messages)
messages = result.messages   # ready to send to the model
```

If the history is already under `budget`, it's returned unchanged
(`summarized=False`). The most recent turn is always kept, even if it alone
exceeds `keep_recent`.

## Pairs with contextcram

```python
from chatcram import Compactor
from contextcram import Packer

history = Compactor(budget=3000, summarize=summarize).compact(messages).messages

ctx = (
    Packer(model="gpt-4o", reserve=1500)
    .add(SYSTEM_PROMPT, priority="required")
    .add([f"{m['role']}: {m['content']}" for m in history], priority="high", strategy="trim")
    .add(retrieved_docs, priority="medium", strategy="drop")
    .fit()
)
```

## Alternatives

Summarizing old turns isn't new, but it's almost always bundled into a framework
or a heavyweight memory platform. `chatcram` is the standalone, dependency-free
building block:

| Library | Approach | When to prefer it over `chatcram` |
| ------- | -------- | --------------------------------- |
| [LangChain `ConversationSummaryBufferMemory`](https://python.langchain.com/docs/modules/memory/types/summary/) | Summary + buffer memory, inside LangChain | You're already all-in on LangChain |
| [mem0](https://github.com/mem0ai/mem0) / Zep | Hosted "memory layer" with fact extraction + embeddings | You want long-term, retrieval-based memory |
| [tokentrim](https://pypi.org/project/tokentrim) | Drops messages to fit a token limit | You only need to drop, not summarize |

**Choose `chatcram` when** you want a tiny, framework-agnostic helper that
summarizes the old middle of a conversation, with your own summarizer and no
dependencies.

## Development

```bash
git clone https://github.com/Waelr1985/chatcram.git
cd chatcram
uv sync
uv run pytest
uv run ruff check .
uv run mypy
```

## License

MIT
