jeevesagent.vectorstore.base¶
VectorStore protocol + shared types and helpers.
Every concrete vector store implements the VectorStore
protocol — a small async surface (add / delete / search /
search_by_vector / count / get_by_ids). Backends differ in storage
and ANN algorithm, but the interface is identical so swapping
InMemoryVectorStore for ChromaVectorStore / etc. is a
one-line change.
# Filtering
The filter argument to search() is a Mongo-style query
expression — see jeevesagent.vectorstore._filter for the
operator reference. Common shapes:
{"source": "report.pdf"} # equality shorthand
{"page": {"$gte": 5}} # range
{"tag": {"$in": ["draft", "final"]}} # membership
{"$and": [{"a": 1}, {"b": 2}]} # composition
# Diversity (MMR)
search() accepts diversity: float | None in [0, 1] for
Maximal Marginal Relevance reranking. None (default) gives
plain top-k by similarity. 0.0 is identical to None;
1.0 is maximum diversity. Most users want 0.3..``0.5``
when they want diversity at all.
We picked the 0..1 diversity scale (rather than LangChain’s
inverted lambda_mult) because “more diverse → bigger number”
is intuitive and “fully relevant” is the natural zero state.
Classes¶
One hit from |
|
Async protocol for vector stores. |
Functions¶
|
Return True if |
Module Contents¶
- class jeevesagent.vectorstore.base.SearchResult[source]¶
One hit from
VectorStore.search().chunk— the matched chunk (with its full metadata).score— similarity in [-1, 1] for cosine; backend- specific for other distance metrics. Higher = more similar.id— the store-assigned id (so callers candelete()orget_by_ids()later).
- class jeevesagent.vectorstore.base.VectorStore[source]¶
Bases:
ProtocolAsync protocol for vector stores.
Six methods cover the lifecycle: add (embed + store), delete, search (by query string), search_by_vector (precomputed), count, get_by_ids.
Backends that aren’t natively async (FAISS, Chroma) wrap their sync calls in
anyio.to_thread.run_sync()so they don’t block the event loop.- async add(chunks: list[jeevesagent.loader.base.Chunk], ids: list[str] | None = None) list[str][source]¶
Embed + store
chunks. Returns the assigned ids (caller-provided or generated).
- async delete(ids: list[str]) None[source]¶
Remove the named chunks. Unknown ids are silently skipped (idempotent).
- async get_by_ids(ids: list[str]) list[jeevesagent.loader.base.Chunk][source]¶
Fetch chunks by id, in the same order as
ids. Unknown ids are skipped (the result may be shorter than the input).
- async search(query: str, *, k: int = 4, filter: collections.abc.Mapping[str, Any] | None = None, diversity: float | None = None) list[SearchResult][source]¶
Embed
queryand return the top-kchunks ranked by similarity.filter(optional) restricts candidates by metadata.diversity(optional, 0..1) enables MMR reranking for varied results.
- jeevesagent.vectorstore.base.matches_filter(metadata: collections.abc.Mapping[str, Any], filter: collections.abc.Mapping[str, Any] | None) bool[source]¶
Return True if
metadatasatisfiesfilter.Thin wrapper around
evaluate_filter()with the argument order our existing tests expect.