abacusai.document_retriever
Module Contents
Classes
A vector store that stores embeddings for a list of document trunks. |
- class abacusai.document_retriever.DocumentRetriever(client, name=None, documentRetrieverId=None, createdAt=None, featureGroupId=None, featureGroupName=None, latestDocumentRetrieverVersion={}, documentRetrieverConfig={})
Bases:
abacusai.return_class.AbstractApiClass
A vector store that stores embeddings for a list of document trunks.
- Parameters:
client (ApiClient) – An authenticated API Client instance
name (str) – The name of the document retriever.
documentRetrieverId (str) – The unique identifier of the vector store.
createdAt (str) – When the vector store was created.
featureGroupId (str) – The feature group id associated with the document retriever.
featureGroupName (str) – The feature group name associated with the document retriever.
latestDocumentRetrieverVersion (DocumentRetrieverVersion) – The latest version of vector store.
documentRetrieverConfig (DocumentRetrieverConfig) – The config for vector store creation.
- __repr__()
Return repr(self).
- to_dict()
Get a dict representation of the parameters in this class
- Returns:
The dict value representation of the class parameters
- Return type:
- update(name=None, feature_group_id=None, document_retriever_config=None)
Updates an existing document retriever.
- Parameters:
name (str) – The name group to update the document retriever with.
feature_group_id (str) – The ID of the feature group to update the document retriever with.
document_retriever_config (DocumentRetrieverConfig) – The configuration, including chunk_size and chunk_overlap_fraction, for document retrieval.
- Returns:
The updated document retriever.
- Return type:
- create_version()
Creates a document retriever version from the latest version of the feature group that the document retriever associated with.
- Parameters:
document_retriever_id (str) – The unique ID associated with the document retriever to create version with.
- Returns:
The newly created document retriever version.
- Return type:
- refresh()
Calls describe and refreshes the current object’s fields
- Returns:
The current object
- Return type:
- describe()
Describe a Document Retriever.
- Parameters:
document_retriever_id (str) – A unique string identifier associated with the document retriever.
- Returns:
The document retriever object.
- Return type:
- list_versions(limit=100, start_after_version=None)
List all the document retriever versions with a given ID.
- Parameters:
- Returns:
All the document retriever versions associated with the document retriever.
- Return type:
- get_document_snippet(document_id, start_word_index=None, end_word_index=None)
Get a snippet from documents in the document retriever.
- Parameters:
- Returns:
The documentation snippet found from the document retriever.
- Return type:
- restart()
Restart the document retriever if it is stopped.
- Parameters:
document_retriever_id (str) – A unique string identifier associated with the document retriever.
- wait_until_ready(timeout=3600)
A waiting call until document retriever is ready.
- Parameters:
timeout (int, optional) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out. Default value given is 3600 seconds.
- get_status()
Gets the status of the document retriever.
- Returns:
A string describing the status of a document retriever (pending, complete, etc.).
- Return type:
- get_matching_documents(query, filters=None, limit=None, result_columns=None, max_words=None, num_retrieval_margin_words=None, max_words_per_chunk=None)
Lookup document retrievers and return the matching documents from the document retriever deployed with given query.
Original documents are splitted into chunks and stored in the document retriever. This lookup function will return the relevant chunks from the document retriever. The returned chunks could be expanded to include more words from the original documents and merged if they are overlapping, and permitted by the settings provided. The returned chunks are sorted by relevance.
- Parameters:
query (str) – The query to search for.
filters (dict) – A dictionary mapping column names to a list of values to restrict the retrieved search results.
limit (int) – If provided, will limit the number of results to the value specified.
result_columns (list) – If provided, will limit the column properties present in each result to those specified in this list.
max_words (int) – If provided, will limit the total number of words in the results to the value specified.
num_retrieval_margin_words (int) – If provided, will add this number of words from left and right of the returned chunks.
max_words_per_chunk (int) – If provided, will limit the number of words in each chunk to the value specified. If the value provided is smaller than the actual size of chunk on disk, which is determined during document retriever creation, the actual size of chunk will be used. I.e, chunks looked up from document retrievers will not be split into smaller chunks during lookup due to this setting.
- Returns:
The relevant documentation results found from the document retriever.
- Return type: