Metadata-Version: 2.4
Name: llm-id-compressor
Version: 0.1.0
Summary: Bidirectional UUID<->numeric ID mapping to shrink LLM prompt token counts for large item lists.
License: MIT
Keywords: llm,tokens,prompt-engineering,uuid
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Dynamic: license-file

# llm-id-compressor

[![PyPI version](https://img.shields.io/pypi/v/llm-id-compressor.svg)](https://pypi.org/project/llm-id-compressor/)
[![CI](https://github.com/RoyAbra27/llm-id-compressor/actions/workflows/ci.yml/badge.svg)](https://github.com/RoyAbra27/llm-id-compressor/actions/workflows/ci.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

Shrinks LLM prompt/response token counts when you're sending a list of
UUID-keyed items (transcript lines, database rows, log entries) to a model.
Swaps each UUID for a short numeric ID before the request, and restores the
original UUIDs from the model's response afterward.

## Install

```bash
pip install llm-id-compressor
```

## Usage

```python
from llm_id_compressor import create_id_mapping, replace_uuids_with_nums, restore_uuids_in_response

items = [{"id": "550e8400-e29b-41d4-a716-446655440000", "text": "..."}, ...]

mapping = create_id_mapping(items)
compact_items = replace_uuids_with_nums(items, mapping["uuid_to_num"])

# send compact_items to the LLM, get back a list of {"id": "<num>", ...} lines
response_lines = call_llm(compact_items)

restored = restore_uuids_in_response(response_lines, mapping["num_to_uuid"])
```

`replace_uuids_with_nums` expects each item to have exactly `id` and `text`
keys - any other fields are dropped, and an item missing `text` raises a
`KeyError`. `restore_uuids_in_response` has no such restriction: it preserves
every field on each response line, only rewriting `id`.

`restore_uuids_in_response` silently drops any line whose numeric id wasn't
in the original mapping (a hallucinated id) rather than raising - callers
that insert results by a global primary key would otherwise have one bad
line abort the whole batch.

## Why this exists

Sending a list of UUID-keyed items to an LLM burns tokens twice over: once
in the prompt, once in the response, on 36-character identifiers the model
never actually reasons about. There's no other package on PyPI doing this
specific swap - it's a five-minute fix once you know to look for it, but
easy to miss.

## Contributing

Issues and PRs welcome - see [CONTRIBUTING.md](CONTRIBUTING.md).

## License

MIT
