Memory
The memory organ persists state across calls. Agents depend on the Memory interface, not a concrete store, so you can inject a buffer, a vector DB, or a Redis store without changing agent code (Dependency Inversion).
The interface
from weaveflow import Memory
class Memory(ABC):
def remember(self, key: str, value) -> None: ...
def recall(self, key: str, default=None): ...
def recent(self, limit: int): ... # -> tuple[MemoryRecord, ...]
Inject it via the decorator or BaseAgent:
from weaveflow import agent, DataType, ShortTermMemory
@agent(name="notebook", input=DataType.TEXT, output=DataType.TEXT,
memory=ShortTermMemory(max_items=100))
async def notebook(ctx):
ctx.memory.remember("last_input", ctx.input.value)
return ctx.input.value
Short-term memory
A capacity-bounded, insertion-ordered key/value buffer: the context window. It can never grow without limit (oldest entries are evicted past max_items).
from weaveflow import ShortTermMemory
mem = ShortTermMemory(max_items=50)
mem.remember("topic", "ports")
mem.recall("topic") # "ports"
mem.recent(5) # last 5 records, newest last
Long-term memory (vector store)
A dependency-free, in-memory vector store with cosine-similarity search. Designed for local development and tests; in production, inject a real vector DB behind the same Memory contract. Embeddings are supplied by the caller, so this store does not call an LLM itself.
from weaveflow import LongTermMemory
mem = LongTermMemory()
mem.remember_vector("doc1", "north star", embedding=[0.0, 1.0])
mem.remember_vector("doc2", "east wind", embedding=[1.0, 0.0])
results = mem.search([0.0, 0.9], top_k=1) # -> tuple[ScoredRecord, ...]
results[0].record.value # "north star"
results[0].score # cosine similarity in [-1, 1]
Note:
searchuses strict zip, so query and stored vectors must share the same dimension. A mismatch fails fast rather than silently truncating.
Concurrency
Agents themselves are safe to run concurrently. They hold no per-run mutable state, and Payloads are immutable, so the same agent (or Pipeline/Parallel) can serve thousands of simultaneous run() calls without cross-talk.
The one exception is shared in-process memory. A Memory instance injected into an agent is shared across every concurrent run of that agent. Individual remember/recall calls are atomic, but a read-modify-write across an await is not, so concurrent runs can interleave and lose updates:
# UNSAFE under concurrency: two runs can read the same `count`, then both write count+1.
current = ctx.memory.recall("count", 0)
result = await some_llm_call(...) # <-- another run interleaves here
ctx.memory.remember("count", current + 1) # lost update
This is inherent to concurrent read-modify-write, not specific to Weave. Avoid it by:
- Per-run keys: write under a key unique to the run (e.g. the input id), never a shared accumulator. Safe at any concurrency (verified to 2,000 simultaneous runs).
- External atomic state: for shared counters/aggregates, inject a store that offers atomic operations (Redis
INCR, a DB) behind the sameMemorycontract.
ShortTermMemory and LongTermMemory are in-process dev stores; treat them as single-writer per logical conversation, not as a cross-run shared accumulator.
Choosing a store
| Need | Use |
|---|---|
| Conversation/context window | ShortTermMemory |
| Semantic recall / RAG | LongTermMemory (dev) or a real vector DB adapter (prod) |
| Both | Inject one of each into different agents |