Recall™
Memory infrastructure for AI

The memory layer for AI agents you actually own.

Your agents need a place to remember what they learned yesterday so they don't waste tokens — or trust — relearning it today. Today's options all force a tradeoff: hosted memory APIs that read your prompts and meter you by the token, vector databases you have to wire up, chunk for, embed against, and operate, framework-bundled memory that locks you to one agent runtime, open-source memory frameworks (Letta, MemGPT, LangGraph) that hand you a Python library and call it a product, and notebook-grade RAG demos that die the moment a second user shows up.

Recall is built for the team that's done evaluating those. One Docker image, runs on your infrastructure, speaks native MCP to every major agent client, exposes a clean HTTP API for everything else. Your data never leaves your network. Your bill doesn't grow with your context window. Your agents get a brain that's actually yours — versioned, auditable, portable, and fast enough to feel local because it is local.

MIT-licensed core Zero data exfiltration Flat pricing — no per-token fees 60 seconds from docker run to production

The problem with hosted memory APIs

Three reasons teams shipping serious AI eventually stop renting memory and start hosting it themselves.

$

Per-token pricing scales against you

The more your agents remember, the more you pay forever. A useful brain compounds — your bill compounds with it. Flat self-hosted pricing breaks that loop.

Your data is on someone else's infrastructure

Customer conversations, internal docs, regulated content — all of it streamed to a third party who reads it to "improve their service." Compliance teams hate this. So do customers.

You can't air-gap a SaaS

Healthcare, finance, government, defense — they all need on-premise. Hosted memory APIs literally cannot serve these customers. Self-host or lose the deal.

How Recall works

One Docker image. Two SDKs. Thirteen tools. Your data never leaves your network.

One image, one binary

docker run ghcr.io/recallworks/recall and you have a working memory server. SQLite under the hood. No external dependencies. No outbound calls.

13 tools over plain HTTP

remember, recall, recall_filtered, reflect, anti_pattern, checkpoint, forget, index_file, reindex, session_close, memory_stats, pulse, maintenance. POST JSON, get JSON. That's the whole API.

Hash-chained audit log

Every write appended to a tamper-evident ledger. Verify the chain with one call. Compliance teams stop crying. Your auditors get a clean answer.

Python and TypeScript SDKs

pip install recall-client or npm i @recallworks/recall-client. Both are 1:1 with the HTTP API and shipped with sigstore provenance.

Air-gap capable

No telemetry. No phoning home. No mandatory cloud. Drop the image on an isolated network and it just works. Your defense and healthcare customers can finally say yes.

Pluggable embeddings

Bring your own embedding model — local, OpenAI, Voyage, anything. The bundled local model gets you running on a laptop with no API keys.

Native MCP support

Drop Recall into Claude Desktop, Cursor, Cline, or Continue.dev as an MCP server in 30 seconds. Your AI assistant gets persistent memory across sessions — locally. Setup guide →

What you actually see three personas, three views

Recall is infrastructure — there's no "Recall app" on anyone's monitor. What changes depends on who you are.

The developer / IT person

(the one who installs it)

Runs one command on a server inside your network — laptop, EC2, on-prem VM, doesn't matter:

docker run -d -p 8787:8787 \
  -e API_KEY=xxx \
  ghcr.io/recallworks/recall:0.3.3

That's the entire install. Health-check at /health. No dashboard to log into, no SaaS account to provision. Memory lives in one SQLite file they can back up, encrypt, or move.

The end user

(the one whose work gets better)

Sees nothing. That's the point. They keep using Claude Desktop, Cursor, VS Code Copilot, Cline, or Continue.dev — the chat tool they already use. After IT adds 6 lines to a config file, those tools quietly gain a new capability:

"Hey Claude, what did we decide about the auth refactor last week?"
→ Claude calls recall() under the hood, gets the decisions from 9 days ago, answers as if it remembered.

The user never sees the tool call. They just notice their AI suddenly has a memory.

The admin / auditor

(the one who has to defend it to legal)

When the CISO asks "where does this stuff live and who can read it," the answer is concrete:

  • Memory = one SQLite file on disk. cp, encrypt, delete.
  • Every query logged with the API key that made it.
  • tcpdump proves zero outbound calls.
  • CLI: recall stats, recall search, recall forget.

Web admin console (browse, edit, tag, restore) is on the Team-tier roadmap — not in 0.3.3.

Recall vs. the alternatives

Recall Hosted memory APIs Roll your own (pgvector + LangChain)
Self-host✓ One Docker image✗ SaaS only~ DIY infra
Air-gap capable✓ Zero outbound~ Maybe
Tamper-evident audit✓ Hash-chained ledger~ Provider logs✗ Build it yourself
Pricing model✓ Flat — pay once✗ Per-token, forever~ Just infra costs
Time to first call60 seconds5 minutes (account + key)~1 week of plumbing
LicenseMIT core, BSL enterpriseClosedMixed

Pricing simple, flat, honest

No per-token fees. No surprise overages. Pick the tier that matches what you're shipping.

Open Source

$0MIT — forever

For solo developers, OSS projects, and single-tenant production deployments.

  • Full Recall server
  • Both SDKs (Python + TS)
  • Docker image (GHCR)
  • All 13 tools
  • Single-tenant audit log
  • Community support
Get the source →

Enterprise

Customannual contract

For regulated industries, air-gapped deployments, and teams that need a phone number.

  • Everything in Team
  • Air-gap install bundle
  • Per-tenant share-back / channel-brain
  • Dedicated SE for onboarding
  • 4-hour response SLA
  • Procurement / MSA / DPA support
Contact sales →

Built on Recall

Real products shipping Recall in production today.

IceWhisperer

On-customer brain for mortgage-tech teams running ICE Encompass. Ships Recall + a 12,000-chunk pre-built corpus + a browser extension that captures what your team is already reading. Self-hosted, zero data exfil, flat monthly pricing.

For developers show me the code

Three ways to start. All three connect to the same Recall server.

# 1. Run the server
docker run -d --name recall -p 8787:8787 \
  -e RECALL_API_KEY=demo-key-change-me \
  ghcr.io/recallworks/recall:0.3.3

# 2. Smoke test
curl -s -H "X-API-Key: demo-key-change-me" \
  http://localhost:8787/health
# pip install recall-client
from recall_client import Recall

r = Recall("http://localhost:8787", api_key="demo-key-change-me")
r.remember(text="User prefers concise answers", source="prefs")
hits = r.recall(query="how does the user like answers?", n=3)
for h in hits.results:
    print(h.text, h.score)
// npm i @recallworks/recall-client
import { Recall } from "@recallworks/recall-client";

const r = new Recall({ baseUrl: "http://localhost:8787", apiKey: "demo-key-change-me" });
await r.remember({ text: "User prefers concise answers", source: "prefs" });
const hits = await r.recall({ query: "how does the user like answers?", n: 3 });
console.log(hits.results);
# Every tool is one POST. Envelope is always { result, tool, by }.

curl -s -X POST http://localhost:8787/tool/remember \
  -H "X-API-Key: demo-key-change-me" \
  -H "Content-Type: application/json" \
  -d '{"text":"User prefers concise answers","source":"prefs"}'

curl -s -X POST http://localhost:8787/tool/recall \
  -H "X-API-Key: demo-key-change-me" \
  -H "Content-Type: application/json" \
  -d '{"query":"how does the user like answers?","n":3}'

Ready to own your memory layer?

Email us. We'll get you on a 30-day pilot, on your infrastructure, with a real human on the other end.

hello@recall.works