Metadata-Version: 2.4
Name: justllm
Version: 0.0.1
Summary: Production LLM calls. Just the three lines. Reliability, native caching, and reversible context compression on by default.
Project-URL: Homepage, https://github.com/robbiebusinessacc/justllm
Project-URL: Repository, https://github.com/robbiebusinessacc/justllm
Author: Robert Walmsley
License: MIT
License-File: LICENSE
Keywords: ai,anthropic,context-compression,fallback,llm,openai,orchestration,prompt-caching,routing
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Provides-Extra: benchmarks
Requires-Dist: headroom-ai>=0.25; extra == 'benchmarks'
Requires-Dist: tiktoken>=0.7; extra == 'benchmarks'
Provides-Extra: dev
Requires-Dist: build; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: twine; extra == 'dev'
Description-Content-Type: text/markdown

# justllm

**Production LLM calls. Just the three lines.**

```python
from justllm import LLM

llm = LLM("anthropic/claude-opus-4-8")
reply = llm("Summarize this contract.")
```

Cross-provider fallback, native prompt caching, and reversible context
compression are **on by default**. No config. The surface stays tiny on
purpose — the moment you need a dozen knobs, that is what LiteLLM is for.

## Why

The ecosystem split in two: feature-complete but heavy (LiteLLM, LangChain),
or simple but feature-thin (aisuite, any-llm). Nobody ships the production
layer behind a three-line front door. `justllm` is that middle.

The one number that makes it worth a switch: compressing the dynamic junk that
bloats agent calls — tool outputs, logs, RAG dumps — cuts the input-token bill
without touching your code. Measured here (gpt-4o token basis): **53% saved on a
JSON API tool result, 97% on repetitive logs**, with a safe no-op when
compression wouldn't help. The engine is
[Headroom](https://github.com/chopratejas/headroom) (PyPI: `headroom-ai`,
content-aware and reversible); justllm applies it only to tool/retrieved
content, never to your prompts. See [`benchmarks/`](benchmarks/).

## With knobs (when you actually need them)

```python
llm = LLM(
    chain=["anthropic/claude-opus-4-8", "openai/gpt-5", "groq/llama-3.1-70b"],
    compress=True,     # reversible, dynamic-context only
    cache="prompt",    # "cache" never silently means semantic
)
```

## Status

Pre-alpha (`0.0.1`). Working today and covered by the benchmark suite:

- **Reliability** — `with_fallback` + `RetryPolicy`: retry-with-jitter on
  retryable errors only, ordered cross-provider failover. One retry layer.
- **Compression** — `compress`: a thin adapter over Headroom, with a
  conservative structural fallback when Headroom is not installed.

The unified `LLM` client wires these together but its provider transport
(`_complete`) is the remaining piece before `0.1.0`.

## Benchmarks

```bash
pip install -e '.[benchmarks]'
python -m benchmarks.run
```

Measures token/cost savings from compression, the overhead the layer adds, and
that fallback actually recovers provider failures. The suite runs even without
the optional deps (using fallbacks), so it is never a hard blocker.

## License

MIT © Robert Walmsley
