Weave

Memory

The memory organ persists state across calls. Agents depend on the Memory interface, not a concrete store, so you can inject a buffer, a vector DB, or a Redis store without changing agent code (Dependency Inversion).

The interface

from weave import Memory

class Memory(ABC):
    def remember(self, key: str, value) -> None: ...
    def recall(self, key: str, default=None): ...
    def recent(self, limit: int): ...   # -> tuple[MemoryRecord, ...]

Inject it via the decorator or BaseAgent:

from weave import agent, DataType, ShortTermMemory

@agent(name="notebook", input=DataType.TEXT, output=DataType.TEXT,
       memory=ShortTermMemory(max_items=100))
async def notebook(ctx):
    ctx.memory.remember("last_input", ctx.input.value)
    return ctx.input.value

Short-term memory

A capacity-bounded, insertion-ordered key/value buffer: the context window. It can never grow without limit (oldest entries are evicted past max_items).

from weave import ShortTermMemory

mem = ShortTermMemory(max_items=50)
mem.remember("topic", "ports")
mem.recall("topic")          # "ports"
mem.recent(5)                # last 5 records, newest last

Long-term memory (vector store)

A dependency-free, in-memory vector store with cosine-similarity search. Designed for local development and tests; in production, inject a real vector DB behind the same Memory contract. Embeddings are supplied by the caller, so this store does not call an LLM itself.

from weave import LongTermMemory

mem = LongTermMemory()
mem.remember_vector("doc1", "north star", embedding=[0.0, 1.0])
mem.remember_vector("doc2", "east wind",  embedding=[1.0, 0.0])

results = mem.search([0.0, 0.9], top_k=1)   # -> tuple[ScoredRecord, ...]
results[0].record.value   # "north star"
results[0].score          # cosine similarity in [-1, 1]

Note: search uses strict zip, so query and stored vectors must share the same dimension. A mismatch fails fast rather than silently truncating.

Concurrency

Agents themselves are safe to run concurrently. They hold no per-run mutable state, and Payloads are immutable, so the same agent (or Pipeline/Parallel) can serve thousands of simultaneous run() calls without cross-talk.

The one exception is shared in-process memory. A Memory instance injected into an agent is shared across every concurrent run of that agent. Individual remember/recall calls are atomic, but a read-modify-write across an await is not, so concurrent runs can interleave and lose updates:

# UNSAFE under concurrency: two runs can read the same `count`, then both write count+1.
current = ctx.memory.recall("count", 0)
result = await some_llm_call(...)            # <-- another run interleaves here
ctx.memory.remember("count", current + 1)    # lost update

This is inherent to concurrent read-modify-write, not specific to Weave. Avoid it by:

  • Per-run keys: write under a key unique to the run (e.g. the input id), never a shared accumulator. Safe at any concurrency (verified to 2,000 simultaneous runs).
  • External atomic state: for shared counters/aggregates, inject a store that offers atomic operations (Redis INCR, a DB) behind the same Memory contract.

ShortTermMemory and LongTermMemory are in-process dev stores; treat them as single-writer per logical conversation, not as a cross-run shared accumulator.

Choosing a store

NeedUse
Conversation/context windowShortTermMemory
Semantic recall / RAGLongTermMemory (dev) or a real vector DB adapter (prod)
BothInject one of each into different agents