Metadata-Version: 2.4
Name: chatfit
Version: 0.4.0
Summary: Trim conversation history to fit an LLM token budget.
Author: Anandita Singh
License: MIT
Project-URL: Homepage, https://github.com/ananditasinghh/chatfit
Project-URL: Issues, https://github.com/ananditasinghh/chatfit/issues
Keywords: llm,chat,tokens,context-window,rag,openai,anthropic
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: tiktoken
Requires-Dist: tiktoken>=0.5; extra == "tiktoken"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: tiktoken>=0.5; extra == "dev"
Dynamic: license-file

# chatfit

**Trim conversation history to fit an LLM token budget — without forgetting.**

When a chat with an LLM gets long, you eventually blow past the model's context
window and the API errors out. `chatfit` trims the conversation down to a token
budget you choose. It keeps the system prompt and the most recent turns, and
**condenses the older turns into a single summary** so the model retains the
gist of earlier context instead of forgetting it.

> `contextfit` packs your RAG chunks. **`chatfit` packs your chat history.**

- 🧠 **Remembers, doesn't just delete** — old turns become a summary
- 🪶 **Tiny & dependency-free** — pure Python, `tiktoken` optional
- 📌 **Pins your system prompt** so it's never dropped
- ✅ **Always fits** — even an oversized summary is truncated to the budget
- 📊 **Tells you what happened** — tokens before/after, messages dropped

## Install

```bash
pip install chatfit               # pure-Python word-count estimate
pip install "chatfit[tiktoken]"   # accurate token counts
```

## Quick start

```python
from chatfit import fit

messages = [
    {"role": "system",    "content": "You are a helpful assistant."},
    {"role": "user",      "content": "Hi!"},
    {"role": "assistant", "content": "Hello! How can I help?"},
    # ... 50 more turns ...
]

result = fit(messages, max_tokens=4000)

send_to_llm(result.messages)     # guaranteed to fit in 4000 tokens
print(result)                    # what got trimmed and why
```

## How it works

1. If the conversation already fits the budget → returned unchanged.
2. Otherwise: keep the system prompt + the newest turns that fit.
3. The older turns are condensed into one `[Summary of earlier conversation]`
   message so their gist is preserved.
4. The result is **guaranteed** to fit `max_tokens`.

## Bring your own summarizer

`chatfit` never calls an LLM itself. By default it uses a no-LLM summarizer that
lists the topics the user raised. For real AI summaries, pass your own:

```python
def my_summarizer(dropped_messages):
    text = "\n".join(m["content"] for m in dropped_messages)
    return openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": f"Summarize:\n{text}"}],
    ).choices[0].message.content

result = fit(messages, max_tokens=4000, summarizer=my_summarizer)
```

## `ChatMemory` — rolling memory for ongoing chats

`fit()` is one-shot. For a live conversation, use `ChatMemory`: you `add()`
turns as they happen and it keeps recent turns verbatim while *incrementally*
folding older ones into a single rolling summary — far cheaper than
re-summarizing from scratch every turn, and always within budget.

```python
from chatfit import ChatMemory

mem = ChatMemory(max_tokens=2000, summarizer=my_llm_summarizer)
mem.set_system("You are a helpful assistant.")

mem.add_user("Hi!")
mem.add_assistant("Hello! How can I help?")
# ... many turns later ...

messages = mem.render()   # always fits 2000 tokens; oldest turns summarized
response = openai.chat.completions.create(model="gpt-4", messages=messages)
```

The summary stays bounded (hierarchical): each fold re-summarizes the previous
summary together with the newly dropped turn, so it never grows without limit.

## The `fit()` function

```python
fit(
    messages,            # list of {"role": ..., "content": ...} dicts
    max_tokens,          # the budget the result must fit within
    pin_system=True,     # never drop system messages
    model="gpt-4",       # used for token counting
    summarizer=None,     # your callable; defaults to a built-in no-LLM one
)
```

Returns a `TrimResult`:

| Attribute | Meaning |
|---|---|
| `.messages` | the trimmed conversation |
| `.tokens_before` / `.tokens_after` | token counts before/after |
| `.tokens_saved` | tokens removed |
| `.dropped_count` / `.kept_count` | original messages dropped / messages kept |
| `.fits` | is it within budget? |
| `.was_trimmed` | did anything get dropped? |

## Run the demo & tests

```bash
pip install -e ".[dev]"
python examples/demo.py
python examples/try_it.py
pytest
```

## Roadmap

- `keep_relevant` — keep the most *relevant* old turns, not just the newest
  (powered by the relevance engine from its sister library, `contextfit`)
- semantic de-duplication of repeated turns
- auto-detect a model's context window

## License

MIT
