jeevesagent.memory.embedder
===========================

.. py:module:: jeevesagent.memory.embedder

.. autoapi-nested-parse::

   Embedders that turn text into vectors.

   Two implementations land in this slice:

   * :class:`HashEmbedder` — deterministic, zero-dep, SHA256-seeded
     Gaussian sample. Same text → same vector. Perfect for tests, dev,
     and for memory backends that only need *some* vector to enable
     recall without the cost of a real embedding API.
   * :class:`OpenAIEmbedder` — wraps OpenAI's
     ``text-embedding-3-{small,large}`` via the official ``openai`` SDK.
     Lazy SDK import inside ``__init__`` so the module loads without
     ``openai`` installed; the import only fires when constructing
     without ``client=``.



Attributes
----------

.. autoapisummary::

   jeevesagent.memory.embedder.DEFAULT_HASH_DIMENSIONS


Classes
-------

.. autoapisummary::

   jeevesagent.memory.embedder.CohereEmbedder
   jeevesagent.memory.embedder.HashEmbedder
   jeevesagent.memory.embedder.OpenAIEmbedder
   jeevesagent.memory.embedder.VoyageEmbedder


Module Contents
---------------

.. py:class:: CohereEmbedder(model: str = 'embed-english-v3.0', *, client: Any | None = None, api_key: str | None = None, input_type: str = 'search_document')

   Embeddings via Cohere's ``cohere`` SDK.

   Models and dimensions:

   * ``embed-english-v3.0`` / ``embed-multilingual-v3.0`` -> 1024
   * ``embed-english-light-v3.0`` / ``embed-multilingual-light-v3.0`` -> 384

   ``input_type`` is required by Cohere v3 models:

   * ``"search_document"`` (default) — corpus / fact-store entries
   * ``"search_query"`` — retrieval queries
   * ``"classification"`` / ``"clustering"`` for non-retrieval uses


   .. py:method:: embed(text: str) -> list[float]
      :async:



   .. py:method:: embed_batch(texts: list[str]) -> list[list[float]]
      :async:



   .. py:attribute:: dimensions
      :type:  int


   .. py:attribute:: name
      :type:  str
      :value: 'embed-english-v3.0'



.. py:class:: HashEmbedder(dimensions: int = DEFAULT_HASH_DIMENSIONS)

   Deterministic SHA256-seeded unit vectors.

   Each text gets a fresh ``random.Random`` seeded by the SHA256 of
   its UTF-8 bytes, then samples ``dimensions`` Gaussian values and
   L2-normalises the result. Same text always produces the same
   vector; different texts produce well-distributed vectors with
   cosine distances that correlate with literal text equality (not
   semantic similarity).

   Use this in tests (fast, no network) and as a default for
   in-memory backends that need *some* vector but don't need real
   semantic recall.


   .. py:method:: embed(text: str) -> list[float]
      :async:



   .. py:method:: embed_batch(texts: list[str]) -> list[list[float]]
      :async:



   .. py:attribute:: dimensions
      :type:  int
      :value: 384



   .. py:attribute:: name
      :type:  str
      :value: 'hash-embedder-384'



.. py:class:: OpenAIEmbedder(model: str = 'text-embedding-3-small', *, dimensions: int | None = None, client: Any | None = None, api_key: str | None = None)

   Embeddings via OpenAI's ``embeddings.create`` API.

   Dimensions are fixed by the model:

   * ``text-embedding-3-small`` -> 1536
   * ``text-embedding-3-large`` -> 3072
   * ``text-embedding-ada-002`` -> 1536

   Pass ``dimensions=`` only for ``text-embedding-3-*`` models, which
   support the ``dimensions`` parameter for projection.


   .. py:method:: embed(text: str) -> list[float]
      :async:



   .. py:method:: embed_batch(texts: list[str]) -> list[list[float]]
      :async:



   .. py:attribute:: dimensions
      :type:  int


   .. py:attribute:: name
      :type:  str
      :value: 'text-embedding-3-small'



.. py:class:: VoyageEmbedder(model: str = 'voyage-3', *, client: Any | None = None, api_key: str | None = None, input_type: str = 'document')

   Embeddings via Voyage AI's ``voyageai`` SDK.

   Models and dimensions:

   * ``voyage-3`` / ``voyage-3-large`` / ``voyage-code-3`` -> 1024
   * ``voyage-3-lite`` -> 512

   ``input_type`` controls how Voyage encodes the text:

   * ``"document"`` (default) — for corpus / fact-store entries
   * ``"query"`` — for retrieval queries

   Pass an explicit ``input_type=`` if your embedder is dedicated to
   one role; for the agent loop's mixed use (we embed both stored
   triples and recall queries through the same embedder), the
   ``"document"`` default is the safer choice.


   .. py:method:: embed(text: str) -> list[float]
      :async:



   .. py:method:: embed_batch(texts: list[str]) -> list[list[float]]
      :async:



   .. py:attribute:: dimensions
      :type:  int


   .. py:attribute:: name
      :type:  str
      :value: 'voyage-3'



.. py:data:: DEFAULT_HASH_DIMENSIONS
   :value: 384


