Metadata-Version: 2.4
Name: sam_rag
Version: 0.1.1
Summary: A document-ingesting agent that monitors specified directories, keeping stored documents up to date in a vector database for Retrieval-Augmented Generation (RAG) queries.
Project-URL: Homepage, https://github.com/SolaceLabs/solace-agent-mesh
Project-URL: Documentation, https://solacelabs.github.io/solace-agent-mesh/
Project-URL: Repository, https://github.com/SolaceLabs/solace-agent-mesh-core-plugins
Project-URL: Issues, https://github.com/SolaceLabs/solace-agent-mesh-core-plugins/issues
Author-email: SolaceLabs <solacelabs@solace.com>
License: Apache License 2.0
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
Requires-Python: <3.14,>=3.10.16
Requires-Dist: beautifulsoup4==4.13.5
Requires-Dist: lxml==6.0.1
Requires-Dist: nltk<4,>=3.9.4
Requires-Dist: odfpy==1.4.1
Requires-Dist: openai==1.99.9
Requires-Dist: pandas==2.3.2
Requires-Dist: pdfplumber<1,>=0.11.7
Requires-Dist: pypdf<7,>=6.9.1
Requires-Dist: python-docx==1.2.0
Requires-Dist: pyyaml==6.0.2
Requires-Dist: qdrant-client==1.13.3
Requires-Dist: scikit-learn==1.7.2
Requires-Dist: watchdog==6.0.0
Provides-Extra: cloud-storage
Requires-Dist: boto3>=1.28.0; extra == 'cloud-storage'
Requires-Dist: google-api-python-client>=2.100.0; extra == 'cloud-storage'
Requires-Dist: google-auth-httplib2>=0.1.1; extra == 'cloud-storage'
Requires-Dist: google-auth-oauthlib>=1.0.0; extra == 'cloud-storage'
Requires-Dist: requests>=2.31.0; extra == 'cloud-storage'
Provides-Extra: nlp
Requires-Dist: langdetect>=1.0.9; extra == 'nlp'
Requires-Dist: nltk>=3.7; extra == 'nlp'
Requires-Dist: spacy>=3.4.0; extra == 'nlp'
Provides-Extra: onedrive
Requires-Dist: msal>=1.24.0; extra == 'onedrive'
Requires-Dist: requests>=2.31.0; extra == 'onedrive'
Provides-Extra: pinecone
Requires-Dist: pinecone>=6.0.0; extra == 'pinecone'
Description-Content-Type: text/markdown

# Solace Agent Mesh RAG

A document-ingesting agent that monitors specified directories, keeping stored documents up to date in a vector database for Retrieval-Augmented Generation (RAG) queries.

## About Solace Agent Mesh

Solace Agent Mesh (SAM) is an open-source framework for building event-driven, multi-agent AI systems where specialized agents collaborate on complex tasks. It provides a standardized way for agents to communicate, share data, and integrate with external systems while keeping components loosely coupled and production-ready.

SAM helps you:

- Build event-driven multi-agent systems on Solace Event Mesh
- Connect agents, tools, gateways, and services through a common runtime
- Extend projects with installable plugins such as `sam-rag`

Learn more in the [Solace Agent Mesh documentation](https://solacelabs.github.io/solace-agent-mesh/) and the [main project repository](https://github.com/SolaceLabs/solace-agent-mesh).

## Overview

The Solace Agent Mesh RAG system provides a complete RAG pipeline that includes:

1. **Document Scanning**: Monitors directories for new, modified, or deleted documents
2. **Document Preprocessing**: Cleans and normalizes text from various document formats
3. **Text Splitting**: Breaks documents into smaller chunks for embedding
4. **Embedding Generation**: Converts text chunks into vector embeddings
5. **Vector Storage**: Stores embeddings in a vector database for efficient retrieval
6. **Retrieval**: Finds relevant document chunks based on query similarity
7. **Augmentation**: Enhances retrieved content using LLMs

## Documentation

Comprehensive documentation is available in the `docs` directory:
- [Architecture Guide](docs/architecture.md): Overview of the SAM RAG architecture and components
- [Configuration Guide](docs/configuration.md): Detailed explanation of configuration options
- [Tools and Lifecycle Guide](docs/tools_and_lifecycle.md): Documentation for tools and lifecycle functions

## Installation

### Add the RAG Plugin to Solace Agent Mesh

```sh
solace-agent-mesh plugin add <your-new-component-name> --plugin sam-rag
```
This will create a new component configuration at configs/plugins/<your-new-component-name-kebab-case>.yaml. You need to configure proper values by updating this file. Export at least the following environment variables to work with the default configuration. For more advance settings, please visit the [Configuration Guide](docs/configuration.md).

```
export SOLACE_BROKER_URL=ws://localhost:8008
export SOLACE_BROKER_USERNAME=admin
export SOLACE_BROKER_PASSWORD=admin
export SOLACE_BROKER_VPN=default
export SOLACE_IS_QUEUE_TEMPORARY=true

export OPENAI_MODEL_NAME=<LLM MODEL NAME>
export OPENAI_API_KEY=<LLM KEY>
export OPENAI_API_ENDPOINT=<LLM ENDPOINT>

export QDRANT_URL=<QDRANT CLUSTER URL>
export QDRANT_API_KEY=<QDRANT API KEY>
export QDRANT_COLLECTION=<A NAME FOR QDRANT COLLECTION>
export QDRANT_EMBEDDING_DIMENSION=1024
export DOCUMENTS_PATH=<PATH OF SOURCE DOCUMENTS IN LOCAL DISK>
```

### Key Configuration Sections

- **Scanner Configuration**: Document source and monitoring settings
- **Preprocessor Configuration**: Text extraction and cleaning settings
- **Splitter Configuration**: Document chunking settings
- **Embedding Configuration**: Vector embedding settings
- **Vector Database Configuration**: Storage and retrieval settings
- **LLM Configuration**: Language model settings for augmentation
- **Retrieval Configuration**: Search parameters

## Usage

### Running the RAG System

```sh
solace-agent-mesh run
```

### Querying the RAG System
Open the SAM UI on the browser.

#### Ingesting documents
(Option1): Store documents in a specific directory and configure the directory path in the ```rag.yaml``` file.
After running SAM, the plugin ingests documents in background automatically.

(Option2): Open the SAM UI on the browser (by default ```http://localhost:8000```), attach files to a query such as "ingest the attached document to RAG".
This query persistently stores the attachments in file system and index them in vector database.

#### Retrieving documents
Use SAM UI on the browser (by default ```http://localhost:8000```) or any other interfaces and send a query such as "search documents about <your query> and return a summary and referenced documents". It retrieves top similar documents and returns a summary of documents align with their original documents.
