Metadata-Version: 2.4
Name: llm-message-hash-py
Version: 0.1.0
Summary: Stable canonical sha256 hash of LLM request/message structures. Recursive key-sorted JSON canonicalization with per-provider presets that drop noise fields. For cache keys and idempotency. Zero runtime deps.
Project-URL: Homepage, https://github.com/MukundaKatta/llm-message-hash-py
Project-URL: Issues, https://github.com/MukundaKatta/llm-message-hash-py/issues
Project-URL: Source, https://github.com/MukundaKatta/llm-message-hash-py
Project-URL: Rust sibling, https://crates.io/crates/llm-message-hash
Author-email: Mukunda Rao Katta <mukunda.vjcs6@gmail.com>
License: MIT
License-File: LICENSE
Keywords: ai,anthropic,bedrock,cache,canonical-json,gemini,hash,idempotency,llm,openai,sha256
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Description-Content-Type: text/markdown

# llm-message-hash-py

[![PyPI](https://img.shields.io/pypi/v/llm-message-hash-py.svg)](https://pypi.org/project/llm-message-hash-py/)
[![Python](https://img.shields.io/pypi/pyversions/llm-message-hash-py.svg)](https://pypi.org/project/llm-message-hash-py/)
[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)

**Stable canonical sha256 hash of LLM request/message structures.**

Two semantically identical Anthropic requests can produce different
`sha256(json.dumps(req))` results because Python dict iteration order is
not part of the value, and fields like `cache_control` change the bytes
without changing what gets sent to the model. This library walks the
value tree, sorts dict keys recursively, drops a configurable set of
fields, and sha256s the canonical bytes.

Useful for prompt-cache lookups, idempotency keys, and dedupe.

Sibling to the Rust crate
[`llm-message-hash`](https://crates.io/crates/llm-message-hash).

## Install

```bash
pip install llm-message-hash-py
```

## Use

Default (no fields dropped):

```python
from llm_message_hash import hash_request

a = {"model": "claude", "messages": [{"role": "user", "content": "hi"}]}
b = {"messages": [{"content": "hi", "role": "user"}], "model": "claude"}

assert hash_request(a) == hash_request(b)
```

Per-provider preset (drops cache_control, response-only fields, etc.):

```python
from llm_message_hash import HashOpts, hash_request

with_cc = {
    "messages": [{
        "role": "user",
        "content": [{"type": "text", "text": "hi", "cache_control": {"type": "ephemeral"}}],
    }],
}
without_cc = {
    "messages": [{
        "role": "user",
        "content": [{"type": "text", "text": "hi"}],
    }],
}

h1 = hash_request(with_cc, HashOpts.for_anthropic())
h2 = hash_request(without_cc, HashOpts.for_anthropic())
assert h1 == h2
```

You can also get the canonical bytes directly:

```python
from llm_message_hash import canonical_json

s = canonical_json({"b": 1, "a": 2})
assert s == '{"a":2,"b":1}'
```

## Presets

Each preset drops the response-side metadata that varies per call plus
provider-specific request fields that do not change semantics:

| Preset | Drops |
| --- | --- |
| `HashOpts.for_anthropic()` | `cache_control`, `id`, `usage`, `stop_reason`, `stop_sequence` |
| `HashOpts.for_openai()` | `created`, `id`, `object`, `system_fingerprint`, `usage`, `finish_reason` |
| `HashOpts.for_bedrock()` | `cache_control`, `usage`, `stopReason`, `metrics` |
| `HashOpts.for_gemini()` | `usageMetadata`, `safetyRatings`, `finishReason` |

Extend any preset:

```python
opts = HashOpts.for_anthropic()
opts.drop_keys.add("metadata")
```

Or build your own:

```python
opts = HashOpts(drop_keys={"trace_id", "request_id"})
```

## Drop key behavior

`drop_keys` matches exact key names at any depth. A key named in
`drop_keys` is removed from every dict it appears in, no matter how
deeply nested. List order is preserved (a list is structurally
significant). Strings are case sensitive: `"hi"` and `"Hi"` hash
differently. Numbers compare by their JSON representation: `42` and
`42.0` are different strings and so hash differently.

## What it does NOT do

- No tokenization. The hash is over structure, not token count.
- No semantic equivalence beyond key-order normalization and the drop
  list.
- No streaming. Pass a complete Python object.
- No HTTP. Does not talk to any LLM provider.

## License

MIT
