#1 on LoCoMo benchmark — zero LLM required

Conversational memory
that actually remembers.

State-of-the-art retrieval over past conversations — 93.9% R@5 on LoCoMo, 98.4% on LongMemEval. No LLM calls. $0 per query. Your words, stored exactly as you said them.

$ pip install engram-search
View on GitHub
MIT licensed
Local-first, cloud-ready
Python 3.9+
Benchmarks

Independently verified on two benchmarks.

Tested on the two widely-used conversational memory benchmarks. No LLM in the loop — just embeddings, sparse retrieval, and a free cross-encoder reranker.

LoCoMo

1,982 questions · 10 conversations
93.9%
R@5 — top result on the benchmark
R@1095.0%
NDCG@50.894
Single-hop90.4%
Temporal93.1%
Contextual97.1%
Adversarial94.6%

LongMemEval

500 questions
98.4%
R@5 — 492 of 500 questions retrieved
R@1099.4%
NDCG@50.934
Multi-session99.2%
Single-session-user100.0%
Knowledge-update98.7%
Temporal-reasoning97.0%

LoCoMo benchmark comparison

Disclaimer: results are compiled from multiple papers and evaluation reports. They are not directly comparable due to differences in backbone LLMs, prompting strategies, and evaluation setups.

System LoCoMo Accuracy LLM required Open source Source
Engram 93.9% (R@5) No Yes (MIT) This repo (reproducible)
EverMemOS 86.76% – 93.05% Yes No arXiv:2601.02163
Zep 85.22% Yes Partial EverMemOS evaluation
MemOS 80.76% Yes Partial EverMemOS evaluation
Mem0 64.20% Yes Partial EverMemOS evaluation
MemU 61.15% Yes Partial arXiv:2601.02163
Other LLM-based (Hindsight, MemGPT, Letta) ~83 – 92% Yes Varies Secondary reports
Non-LLM (SLM variants) ~74 – 75% No Yes Secondary reports
Architecture

Three-stage hybrid retrieval.

Dense semantic search catches meaning. Sparse BM25 catches exact words. A cross-encoder reranker scores the finalists. Nothing is summarized.

1

Dense

bge-large bi-encoder (1024d) finds semantically similar past turns.

2

Sparse

BM25 catches exact names, dates, and rare terms embeddings miss.

3

RRF fusion

Reciprocal Rank Fusion combines both signals without per-query tuning.

4

Rerank

Cross-encoder scores top candidates jointly for the final ranking.

Session chunking

Long sessions dilute embeddings. Chunking at ~6 turns with 1-turn overlap keeps individual facts retrievable.

Timestamp prefix

Prepending [2024-01-15] to each document lets both dense and BM25 match temporal queries.

Speaker-name injection

First-person turns don't contain the speaker's name, so entity-attribute queries fail. Prepending it bridges the gap and lifts LoCoMo R@5 by ~3pts.

Quickstart

Running in two minutes.

One pip install. Works locally with FAISS + SQLite, or plugs into Qdrant for cloud deployment.

# Install
$ pip install engram-search

# Initialize a memory store
$ engram init ./my_memories

# Ingest past conversations
$ engram ingest conversations.json --store ./my_memories

# Search
$ engram search "why did we switch to GraphQL" --store ./my_memories
from engram.backends.faiss_backend import FaissBackend
from engram.backends.base import Document
from engram.ingestion.parser import session_to_documents
from engram.retrieval.embedder import Embedder
from engram.retrieval.pipeline import RetrievalPipeline

embedder = Embedder("bge-large")
backend = FaissBackend(path="./my_memories", dimension=1024)
pipeline = RetrievalPipeline(embedder=embedder)

turns = [
    {"role": "user", "content": "I'm switching our API from REST to GraphQL."},
    {"role": "assistant", "content": "What's driving the switch?"},
    {"role": "user", "content": "Too many round trips — 12 calls per screen."},
]
docs = session_to_documents(turns, session_id="s1", timestamp="2025-01-15")

results = pipeline.search("why did we switch to GraphQL", documents=docs, top_k=3)
for r in results:
    print(r.text)
# Point Engram at a managed Qdrant cluster
$ export ENGRAM_BACKEND=qdrant
$ export ENGRAM_QDRANT_URL=https://your-cluster.qdrant.io:6333
$ export ENGRAM_QDRANT_API_KEY=your-api-key

# Start the API server
$ pip install fastapi uvicorn
$ uvicorn engram.server:app --host 0.0.0.0 --port 8000

# Endpoints available
# POST /ingest   — add conversations
# POST /search   — retrieve memories
# GET  /health   — health check
# GET  /stats    — store statistics
Why Engram

Built for agents that need real memory.

Zero LLM calls

Retrieval only. Deterministic, reproducible, no per-query spend, no prompt drift, no rate limits.

Exact words preserved

Nothing is summarized or paraphrased on the way in. What you said is what gets returned.

Local-first

FAISS + SQLite out of the box. Runs entirely on your machine. No API keys needed to get started.

Cloud-ready

Plug into Qdrant for multi-tenant, horizontally-scalable memory. Same API, same accuracy.

Ready to give your agent a memory?

MIT licensed. Reproducible benchmarks. Drop it into your RAG pipeline today.

$ pip install engram-search
Star on GitHub