jeevesagent.memory.embedder

Embedders that turn text into vectors.

Two implementations land in this slice:

  • HashEmbedder — deterministic, zero-dep, SHA256-seeded Gaussian sample. Same text → same vector. Perfect for tests, dev, and for memory backends that only need some vector to enable recall without the cost of a real embedding API.

  • OpenAIEmbedder — wraps OpenAI’s text-embedding-3-{small,large} via the official openai SDK. Lazy SDK import inside __init__ so the module loads without openai installed; the import only fires when constructing without client=.

Attributes

Classes

CohereEmbedder

Embeddings via Cohere's cohere SDK.

HashEmbedder

Deterministic SHA256-seeded unit vectors.

OpenAIEmbedder

Embeddings via OpenAI's embeddings.create API.

VoyageEmbedder

Embeddings via Voyage AI's voyageai SDK.

Module Contents

class jeevesagent.memory.embedder.CohereEmbedder(model: str = 'embed-english-v3.0', *, client: Any | None = None, api_key: str | None = None, input_type: str = 'search_document')[source]

Embeddings via Cohere’s cohere SDK.

Models and dimensions:

  • embed-english-v3.0 / embed-multilingual-v3.0 -> 1024

  • embed-english-light-v3.0 / embed-multilingual-light-v3.0 -> 384

input_type is required by Cohere v3 models:

  • "search_document" (default) — corpus / fact-store entries

  • "search_query" — retrieval queries

  • "classification" / "clustering" for non-retrieval uses

async embed(text: str) list[float][source]
async embed_batch(texts: list[str]) list[list[float]][source]
dimensions: int
name: str = 'embed-english-v3.0'
class jeevesagent.memory.embedder.HashEmbedder(dimensions: int = DEFAULT_HASH_DIMENSIONS)[source]

Deterministic SHA256-seeded unit vectors.

Each text gets a fresh random.Random seeded by the SHA256 of its UTF-8 bytes, then samples dimensions Gaussian values and L2-normalises the result. Same text always produces the same vector; different texts produce well-distributed vectors with cosine distances that correlate with literal text equality (not semantic similarity).

Use this in tests (fast, no network) and as a default for in-memory backends that need some vector but don’t need real semantic recall.

async embed(text: str) list[float][source]
async embed_batch(texts: list[str]) list[list[float]][source]
dimensions: int = 384
name: str = 'hash-embedder-384'
class jeevesagent.memory.embedder.OpenAIEmbedder(model: str = 'text-embedding-3-small', *, dimensions: int | None = None, client: Any | None = None, api_key: str | None = None)[source]

Embeddings via OpenAI’s embeddings.create API.

Dimensions are fixed by the model:

  • text-embedding-3-small -> 1536

  • text-embedding-3-large -> 3072

  • text-embedding-ada-002 -> 1536

Pass dimensions= only for text-embedding-3-* models, which support the dimensions parameter for projection.

async embed(text: str) list[float][source]
async embed_batch(texts: list[str]) list[list[float]][source]
dimensions: int
name: str = 'text-embedding-3-small'
class jeevesagent.memory.embedder.VoyageEmbedder(model: str = 'voyage-3', *, client: Any | None = None, api_key: str | None = None, input_type: str = 'document')[source]

Embeddings via Voyage AI’s voyageai SDK.

Models and dimensions:

  • voyage-3 / voyage-3-large / voyage-code-3 -> 1024

  • voyage-3-lite -> 512

input_type controls how Voyage encodes the text:

  • "document" (default) — for corpus / fact-store entries

  • "query" — for retrieval queries

Pass an explicit input_type= if your embedder is dedicated to one role; for the agent loop’s mixed use (we embed both stored triples and recall queries through the same embedder), the "document" default is the safer choice.

async embed(text: str) list[float][source]
async embed_batch(texts: list[str]) list[list[float]][source]
dimensions: int
name: str = 'voyage-3'
jeevesagent.memory.embedder.DEFAULT_HASH_DIMENSIONS = 384