1. The problem
Most agent stacks treat long-term memory as a thin layer over text storage. You chunk documents, embed them, write rows to a database, and when the agent needs context you assemble a prompt and ask a language model to interpret what is relevant. That pattern works until you count the costs: every turn can trigger a model call, latency is measured in seconds, sampling makes behavior non-deterministic, and the model can confidently narrate relationships that never existed in the store. You are not retrieving a fact; you are asking another model to reconstruct one.
For many tool-using agents, what you actually want is simpler: given a cue, return the closest stored association, with a score that means something operationally, and abstain when nothing in memory should drive a decision. That is associative memory, not open-ended generation.
RAG and LLM-mediated recall still matter when the task is summarization, synthesis across sources, or language-conditioned planning. This post is not arguing that you should delete your vector store everywhere. It is arguing that a large class of "remember this fact and fetch it later" behaviors can be implemented as a small numeric primitive with predictable failure modes, which is easier to test and cheaper to run at high frequency.
2. The insight
Ramsauer et al. (2021), in Hopfield Networks is All You Need, showed that the update rule of a Modern Hopfield Network (MHN) lines up with the structure of a single step of transformer attention when keys and values are tied in a particular way. Intuitively, attention already is a soft winner-take-most combination of stored vectors conditioned on a query. Exposing that operation directly gives you an explicit memory module: fixed rules, no wrapper model, and outputs you can test.
The mhn-ai-agent-memory package implements that idea for text-backed agents. Facts are encoded to vectors and held in a matrix; a query vector runs the same relaxation step you would recognize from attention, and you read back the winning pattern and its weight.
Keeping the operation explicit matters for systems engineering. You can log the weight vector, freeze the encoder, snapshot the pattern matrix, and write property tests that assert "this cue must not retrieve that fact." Wrapping the same geometry inside an LLM turns those guarantees into prompt engineering and post-hoc evaluation.
3. How it works
Let the rows of matrix X be stored patterns (each row is one memory). Let xi be the query embedding. With inverse temperature beta, one synchronous update is:
In plain language: compare the cue to every stored vector, turn similarities into nonnegative weights that sum to one, then take the weighted sum of the memories. The system relaxes toward whichever stored patterns the cue aligns with; with a large enough beta, the softmax sharpens and the output tracks the nearest attractor. Capacity in high dimension scales far better than naive linear readout under separation assumptions on the patterns, which is why this family of models is interesting next to brute-force nearest-neighbor lists.
The reference implementation wraps numerics and text encoding. A minimal store and retrieve loop looks like this:
import numpy as np
# Patterns as rows of X (L2-normalized is typical in practice)
X = np.array([
[1.0, 0.0, 0.0],
[0.0, 1.0, 0.0],
], dtype=np.float64)
beta = 8.0
xi = np.array([0.9, 0.1, 0.0])
logits = beta * (X @ xi)
w = np.exp(logits - np.max(logits))
w /= w.sum()
xi_new = w @ X
# xi_new is pulled toward the first stored pattern
The library packages the same computation with encoders, optional iteration, and agent-facing APIs on top of HopfieldMemory. In multi-step mode you apply the update repeatedly until the state stabilizes, which is the direct analogue of letting an attention block settle while keeping weights tied to the stored memory bank.
4. What makes this different from a vector database
Nearest-neighbor search in a vector database returns a ranked list under a fixed metric. That is static geometry: you ask for the top-k and you get it. The Hopfield step is dynamical: the query defines an energy landscape over stored patterns and the softmax-weighted combination is one relaxation toward an attractor. In practice you still store vectors in a matrix, but the readout rule is not the same as returning cosine neighbors.
Softmax weights are a built-in confidence structure: when one pattern dominates, its weight approaches one. When the landscape is flat, weights spread and the combined state is an ambiguous blend. That is different from a distance threshold on a single neighbor, where you must invent a separate notion of "far enough to ignore."
Finally, the exponential-capacity story belongs to the MHN family under stated conditions (high dimension, pattern separation). A flat index over embeddings scales linearly in the number of stored items for exact search; the theoretical contrast is about representational capacity per dimension, not a claim that your laptop suddenly stores infinite facts.
In deployment you may still use an ANN index for cold storage at millions of patterns; the library's tiered presets combine a hot Hopfield layer for exact microsecond reads with approximate search for archival rows. The conceptual distinction remains: the readout rule is Hopfield relaxation (optionally with repulsive terms), not merely "return top-1 cosine neighbor" unless you choose a code path that does exactly that.
| Traditional (LLM + DB) | This library | |
|---|---|---|
| Retrieval | LLM API call | One matrix multiply (per step) |
| Latency | Seconds | Microseconds (numpy-scale) |
| Cost | Per token | Zero after storage |
| Determinism | Non-deterministic (sampling, provider drift) | Deterministic given fixed encoder and parameters |
| Capacity | Depends on embedding quality and chunking | Exponential in dimension under MHN assumptions (proven) |
5. The "nothing matches" problem
Agents need to know when memory should stay silent. Softmax always produces a full probability vector over patterns, so a naive attention readout always commits to some blend of memories, even when every option is a poor match. That is the same structural issue you see when a chat model picks the least-bad span from retrieved chunks.
The library addresses this with three signals used together: max_similarity (raw alignment before softmax), gap (separation between the top attention masses), and sentinel_weight (mass assigned to a learned or fixed "no match" anchor pattern, implemented as a zero-vector sentinel in the store). When those signals disagree with a confident retrieval, the API returns no result instead of a hallucinated association.
None of the three signals is sufficient alone. High max similarity with a tiny gap means two memories are fighting for attention; low max similarity with a razor-thin gap can still be noise. The sentinel absorbs probability mass when the query is orthogonal to real content, so "I have nothing" becomes a first-class attractor instead of an afterthought threshold. Together they give agents a principled abstention path that stays aligned with the same softmax machinery used for retrieval.
from hopfield_memory import HopfieldMemory
mem = HopfieldMemory()
mem.store("The Eiffel Tower is in Paris")
mem.store("Mount Fuji is in Japan")
# Match when the cue aligns with stored text
print(mem.query_or_none("Eiffel Tower Paris"))
# None when nothing in memory should fire
print(mem.query_or_none("basketball playoffs score"))
mq = mem.match_quality("basketball playoffs score")
# mq["max_similarity"], mq["is_match"], etc.
For production agents, treat query_or_none as the conservative default when a tool call should be gated on epistemic state, and use query_with_confidence only when you understand that softmax mass is not calibrated like a probabilistic classifier.
6. Repulsive attention for faster convergence
When memories are nearby in embedding space, relaxation can linger in a shallow basin. Optional repulsive attention adds negative patterns that contribute contrastive energy: they act like hills that push the state off confusable regions. Benchmarks in the repository report materially fewer steps to convergence in some regimes (on the order of an order of magnitude in documented experiments). It is an opt-in knob, not a second network, and diagnostics on the memory object expose when agents might want to enable it.
You declare negative or confusable exemplars with the same API surface you use for ordinary storage, and the repulsive module shapes the energy landscape without changing the basic matrix-multiply skeleton. That keeps the mental model small: positives define wells, negatives define ridges, and the query still flows through softmax-weighted aggregation.
7. Getting started
pip install mhn-ai-agent-memory
from hopfield_memory import HopfieldMemory
mem = HopfieldMemory()
mem.store("Alice studies topology")
fact, conf = mem.query_with_confidence("topology")
# fact: stored string; conf: softmax-style weight
Optional extras install semantic encoders, OpenAI embeddings, or FAISS-backed scale presets. For example: pip install mhn-ai-agent-memory[semantic] pulls sentence-transformers for higher-quality text geometry; [openai] wires the OpenAI embedding API; [scale] adds FAISS for large cold stores; [all] installs the full matrix. The repository README lists encoder tradeoffs and factory presets (small_memory, large_memory, massive_memory) if you outgrow a single in-memory matrix.