Your agents need a place to remember what they learned yesterday so they don't waste tokens — or trust — relearning it today. Today's options all force a tradeoff: hosted memory APIs that read your prompts and meter you by the token, vector databases you have to wire up, chunk for, embed against, and operate, framework-bundled memory that locks you to one agent runtime, open-source memory frameworks (Letta, MemGPT, LangGraph) that hand you a Python library and call it a product, and notebook-grade RAG demos that die the moment a second user shows up.
Recall is built for the team that's done evaluating those. One Docker image, runs on your infrastructure, speaks native MCP to every major agent client, exposes a clean HTTP API for everything else. Your data never leaves your network. Your bill doesn't grow with your context window. Your agents get a brain that's actually yours — versioned, auditable, portable, and fast enough to feel local because it is local.
Three reasons teams shipping serious AI eventually stop renting memory and start hosting it themselves.
The more your agents remember, the more you pay forever. A useful brain compounds — your bill compounds with it. Flat self-hosted pricing breaks that loop.
Customer conversations, internal docs, regulated content — all of it streamed to a third party who reads it to "improve their service." Compliance teams hate this. So do customers.
Healthcare, finance, government, defense — they all need on-premise. Hosted memory APIs literally cannot serve these customers. Self-host or lose the deal.
One Docker image. Two SDKs. Thirteen tools. Your data never leaves your network.
docker run ghcr.io/recallworks/recall and you have a working memory server. SQLite under the hood. No external dependencies. No outbound calls.
remember, recall, recall_filtered, reflect, anti_pattern, checkpoint, forget, index_file, reindex, session_close, memory_stats, pulse, maintenance. POST JSON, get JSON. That's the whole API.
Every write appended to a tamper-evident ledger. Verify the chain with one call. Compliance teams stop crying. Your auditors get a clean answer.
pip install recall-client or npm i @recallworks/recall-client. Both are 1:1 with the HTTP API and shipped with sigstore provenance.
No telemetry. No phoning home. No mandatory cloud. Drop the image on an isolated network and it just works. Your defense and healthcare customers can finally say yes.
Bring your own embedding model — local, OpenAI, Voyage, anything. The bundled local model gets you running on a laptop with no API keys.
Drop Recall into Claude Desktop, Cursor, Cline, or Continue.dev as an MCP server in 30 seconds. Your AI assistant gets persistent memory across sessions — locally. Setup guide →
Recall is infrastructure — there's no "Recall app" on anyone's monitor. What changes depends on who you are.
(the one who installs it)
Runs one command on a server inside your network — laptop, EC2, on-prem VM, doesn't matter:
docker run -d -p 8787:8787 \
-e API_KEY=xxx \
ghcr.io/recallworks/recall:0.3.3
That's the entire install. Health-check at /health. No dashboard to log into, no SaaS account to provision. Memory lives in one SQLite file they can back up, encrypt, or move.
(the one whose work gets better)
Sees nothing. That's the point. They keep using Claude Desktop, Cursor, VS Code Copilot, Cline, or Continue.dev — the chat tool they already use. After IT adds 6 lines to a config file, those tools quietly gain a new capability:
"Hey Claude, what did we decide about the auth refactor last week?"
→ Claude calls recall() under the hood, gets the decisions from 9 days ago, answers as if it remembered.
The user never sees the tool call. They just notice their AI suddenly has a memory.
(the one who has to defend it to legal)
When the CISO asks "where does this stuff live and who can read it," the answer is concrete:
cp, encrypt, delete.tcpdump proves zero outbound calls.recall stats, recall search, recall forget.Web admin console (browse, edit, tag, restore) is on the Team-tier roadmap — not in 0.3.3.
| Recall | Hosted memory APIs | Roll your own (pgvector + LangChain) | |
|---|---|---|---|
| Self-host | ✓ One Docker image | ✗ SaaS only | ~ DIY infra |
| Air-gap capable | ✓ Zero outbound | ✗ | ~ Maybe |
| Tamper-evident audit | ✓ Hash-chained ledger | ~ Provider logs | ✗ Build it yourself |
| Pricing model | ✓ Flat — pay once | ✗ Per-token, forever | ~ Just infra costs |
| Time to first call | 60 seconds | 5 minutes (account + key) | ~1 week of plumbing |
| License | MIT core, BSL enterprise | Closed | Mixed |
No per-token fees. No surprise overages. Pick the tier that matches what you're shipping.
For solo developers, OSS projects, and single-tenant production deployments.
For teams shipping commercial AI products. Production support and the components you can't get under MIT.
For regulated industries, air-gapped deployments, and teams that need a phone number.
Real products shipping Recall in production today.
On-customer brain for mortgage-tech teams running ICE Encompass. Ships Recall + a 12,000-chunk pre-built corpus + a browser extension that captures what your team is already reading. Self-hosted, zero data exfil, flat monthly pricing.
Three ways to start. All three connect to the same Recall server.
# 1. Run the server docker run -d --name recall -p 8787:8787 \ -e RECALL_API_KEY=demo-key-change-me \ ghcr.io/recallworks/recall:0.3.3 # 2. Smoke test curl -s -H "X-API-Key: demo-key-change-me" \ http://localhost:8787/health
# pip install recall-client from recall_client import Recall r = Recall("http://localhost:8787", api_key="demo-key-change-me") r.remember(text="User prefers concise answers", source="prefs") hits = r.recall(query="how does the user like answers?", n=3) for h in hits.results: print(h.text, h.score)
// npm i @recallworks/recall-client import { Recall } from "@recallworks/recall-client"; const r = new Recall({ baseUrl: "http://localhost:8787", apiKey: "demo-key-change-me" }); await r.remember({ text: "User prefers concise answers", source: "prefs" }); const hits = await r.recall({ query: "how does the user like answers?", n: 3 }); console.log(hits.results);
# Every tool is one POST. Envelope is always { result, tool, by }. curl -s -X POST http://localhost:8787/tool/remember \ -H "X-API-Key: demo-key-change-me" \ -H "Content-Type: application/json" \ -d '{"text":"User prefers concise answers","source":"prefs"}' curl -s -X POST http://localhost:8787/tool/recall \ -H "X-API-Key: demo-key-change-me" \ -H "Content-Type: application/json" \ -d '{"query":"how does the user like answers?","n":3}'
Email us. We'll get you on a 30-day pilot, on your infrastructure, with a real human on the other end.
hello@recall.works