Open Source Context Engineering

Your AI is Blind.
Fix it in 30 seconds.

AI coding tools often see only a slice of your codebase. Entroly gives compatible tools broader selected context — with 70-95% fewer tokens.

$ pip install entroly[full] && entroly go Click to copy
Works with Cursor · Claude Code · Windsurf · Cline
Then the Cache Aligner holds your prefix stable so provider caches actually hit — 90% off cached reads on Anthropic, 50% off on OpenAI (provider-set rates).
Entroly Demo — 70-95% token savings
Watch Demo

See Entroly cut up to 99.5% of tokens — a figure measured on the needle-in-a-haystack benchmark with accuracy fully retained — while your AI sees 100% of the code.

GitHub stars GitHub forks PyPI version License: Apache-2.0
70-95%
Token Reduction
100%
Code Visibility
<8ms
Engine Latency
436
Tests Passing
The Problem

RAG isn't enough for coding.

Standard Context

Files visible5-10 files
Visibility~5%
Tokens used186,420 (Raw)
Cost / 1K Req$560.00
QualityHallucination risk

With Entroly

Files visibleALL 847 files
Visibility100%
Tokens used9,300 - 55,000
Cost / 1K Req$28 - $168
QualityDependency Aware
Features

Context Engineering, Automated.

📦

Zero-Config Install

One command auto-detects your IDE, language, and project structure. No manual prompt engineering required.

🦀

Rust Performance

A high-performance Rust core (via PyO3) processes your entire codebase in under 10ms. Scale to millions of lines.

🧠

PRISM Optimizer

Reinforcement Learning adjusts context weights based on AI response quality. Gets smarter every time you code.

🔒

Built-in Security

55 SAST rules catch hardcoded secrets and SQL injection before they reach the AI. Security by design.

📊

Health Scoring

Get an A-F grade for your codebase health. Detect god files, dead code, and cross-module clones automatically.

🤖

Agent Native

First-class support for Multi-Agent systems. Nash bargaining for token budgets between sub-agents.

Why it compounds

19 cost-saving levers. One install.

Most context tools optimize a single lever — input compression. Entroly ships 19 distinct mechanisms across input, inference, output, verification, and learning. Most are multiplicative, not additive — and every one reads from a real source file you can open and audit.

The under-advertised win

Cache Aligner

Compressors re-rank context on every call, which busts the provider's KV cache. The aligner hashes the injected context and holds the prefix stable so cache hits actually land.

90% / 50%
Anthropic cached-read · OpenAI · entroly/cache_aligner.py
$0 hallucination guard

WITNESS + STAVE

Deterministic faithfulness verifier — no second LLM, no API call. Statistically ties a modern LLM judge on HaluEval-QA at zero marginal cost.

AUROC 0.84
~3 ms/decision · entroly/witness.py · stave.py
Pay for the model you need

RAVS Model Routing

A Bayesian per-task router sends easy work to cheap models and escalates only when verifier risk says so. Fail-closed: when uncertain, it routes to the strongest model.

Haiku → Opus
entroly/ravs/router.py
01

Context compression

Knapsack DP + 9 specialized compressors + a dep-graph pick the most information-dense fragments that fit your budget.

proxy_transform.py
02

WITNESS + STAVE

Deterministic $0 faithfulness verifier — no second LLM, no API call.

witness.py · stave.py
03

Cache Aligner

Holds the prefix stable so Anthropic's 90% / OpenAI's 50% cached-read discount actually lands.

cache_aligner.py
04

Escalation cascade

Cheap model first; escalate only when verifier risk demands it. Bounded regret via split-conformal coverage.

escalation.py
05

Conformal cascade

Two-verifier cascade with a measured Pareto frontier vs. either verifier alone.

conformal_cascade.py
06

RAVS Bayesian router

Per-task model routing — cheap when capable, strong when needed. Fail-closed.

ravs/router.py
07

Fast-path skills

Queries that match a proven crystallized skill short-circuit the whole pipeline — 100% LLM cost saved.

fast_path.py
08

Adaptive Compression Budget

Learns the right token budget per query so easy questions don't overspend.

adaptive_budget.py
09

Entropic Conversation Pruning

Compresses chat history each turn so long conversations don't bloat the input.

proxy_transform.py
10

Shell-output compression

Targeted fast paths for git, builds, logs, JSON, and test output — 60–95% smaller.

proxy_transform.py
11

Response distillation

Compresses the model's response before downstream chains consume it.

proxy_transform.py
12

Local DeBERTa NLI

Runs faithfulness NLI fully offline — ~$0.002/claim drops to $0.

witness.py
13

EICV suppressor

Drops hallucinated content from responses before it propagates downstream.

eicv_suppressor.py
14

PRISM 5D weights

Learns which fragment features matter, with a spectral natural-gradient optimizer.

online_learner.py · prism.rs
15

Federation

Anonymized weight + skill sync across instances amortizes cold-start across the user base.

federation.py
16

Entropic Shell Codec

Universal entropy + SimHash compressor for any tool output — even ones it has never seen.

shell_codec.py
17

Semantic Resolution Protocol

Budget-driven file reads — full, signature-only, or diff-only, chosen per block.

semantic_resolution.py
18

Adversarial Context Firewall

Blocks prompt-injection and context poisoning that bypass regex-only scanners.

context_firewall.py
19

Witness-Verified Handoff

Scans agent output for hallucination before passing it to the next agent.

verified_handoff.py
How they compose. A chatty agent benefits from input compression (#1) and the cache aligner (#3, on whatever survives) and model routing (#6) and response distillation (#11) and universal tool-output compression (#16). Because the wins multiply, the product can leave well under 1% of the original input-token spend on the bill — and every figure is backed by a committed JSON artifact, not a slide.
1
Typical context tool — input compression only
vs
19
Entroly levers across input · inference · output · verification · learning
Reproducible

Numbers you can re-run.

Every figure links to a committed JSON with sample counts, 95% confidence intervals, and model provenance. Clone the repo and reproduce them — or run the packaged smoke verifier on your own code in seconds.

Benchmark Token savings Accuracy retained Samples Artifact
Needle-in-a-haystack 99.5% 100% 20 needle_accuracy.json
LongBench 85.3% 103% (↑) 50 longbench_accuracy.json
BFCL (function calling) 79.3% 100% 50 bfcl_accuracy.json
SQuAD 43.8% 90% 50 squad_accuracy.json
WITNESS+STAVE hallucination (HaluEval-QA) AUROC 0.84 ~3 ms · $0/call 2,000 stave_benchmark.json

SQuAD is shown unfiltered — it's the one benchmark here where a tighter budget trades a little accuracy (0.80 → 0.72) for savings. We include it because cherry-picking benchmarks is how marketing claims get caught. Numbers measured with gpt-4o-mini; see each JSON for the full confidence intervals.

$ entroly verify-claims → runs locally, no API key required. On Entroly's own 92-file repo: 326,361 tokens → 7,969 (97.6% reduction), 9/9 checks pass — written to .entroly_verification.json so you can diff it yourself.

Stop paying for boilerplate.

Free your AI from context blindness today.

View on GitHub
pip install entroly && entroly go