Layered, learning memory for AI agents in two lines of config. Self-host the OSS library, or skip the infrastructure and use the managed version — same engine, your call.
# pip install extremis from extremis import Extremis mem = Extremis() mem.remember("User is building a WhatsApp AI") hits = mem.recall("what is the user building?") for r in hits: print(r.memory.content) print(r.reason) # similarity 0.87 · score +2.0 · used 5× · 3d
Two ways to run it
pip install extremis.
SQLite locally, Postgres / Pinecone / S3 Vectors in production. All
the same primitives — you operate the infra.
HostedClient(api_key=...) and you're
done. Per-tenant Postgres, automatic consolidation, hallucination
detection bundled.
How it works
mem.remember()
append to fsync'd log + episodic store
mem.remember( "user wants the SLA in writing", conversation_id="c1", )
mem.recall()
ranked by cosine × RL score × recency
hits = mem.recall("SLA")
# returns ranked results,
# each with .reason and .verification
mem.reinforce()
asymmetric 1.5× weight on negative signals
mem.report_outcome( [h.memory.id for h in hits[:2]], success=True, )
Every recall explains itself
No black box. Every result carries a one-line reason — the same string the library returns whether you self-host or use Cloud. You see exactly why a memory surfaced, in plain English.
example reason strings
Vs. the alternatives
| Feature | Extremis | Mem0 | Letta | Raw RAG |
|---|---|---|---|---|
| Layered memory (identity/semantic/episodic/procedural) | ✓ | — | — | — |
| RL-scored retrieval (1.5× asymmetric on negatives) | ✓ | — | — | — |
| Hybrid retrieval (semantic + BM25) | ✓ | — | — | — |
| Per-recall reason strings | ✓ | — | — | — |
| Knowledge graph built in | ✓ | — | partial | — |
| Hallucination detection bundled | ✓ | — | — | — |
| Self-hostable (MIT) | ✓ | — | — | ✓ |
| Managed option available | ✓ | ✓ | ✓ | — |
| MCP server (9 tools) | ✓ | — | — | — |
Benchmarks
Hosted Extremis is the identical engine — same numbers, fully managed. Methodology: see the reproducible benchmark run. QA accuracy depends on the answerer model.
Hallucination detection
A two-tier verifier runs at write time: a fast NLI model first, then an LLM judge for grey-zone scores. Failing memories aren't silently dropped — they're tagged unverified and downranked at recall time. Every recall returns a verification trace you can inspect.
example: contradicted recall
extremis.recall 124ms ⌐ embedder.embed 10ms ✓ retrieve.hybrid 11ms ✓ (semantic + BM25) verifier.nli 14ms ⌐ grounded 0.42 verifier.judge 47ms ⌐ grounded 0.18 why it failed: sources self-correct from 99.95% to 99.9%; extracted memory captured the pre-correction value. what to try: mem.remember_now(layer="semantic", confidence=0.95) to pin the corrected fact.
Privacy
The library is open-source under MIT. Self-host on SQLite locally
or any of five production backends. If Cloud isn't for you
(or shuts down tomorrow), point
HostedClient.base_url at your own
deployment and nothing else changes. Cloud is a convenience, not a
dependency.
Both run the same engine. Self-host gives you full control; Cloud gives you back the afternoon.