Metadata-Version: 2.4
Name: ai4rag
Version: 0.5.4
Summary: Automatic and optimized RAG Pattern generator
Author: IBM
Maintainer-email: Jakub Walaszczyk <jwalaszc@redhat.com>, Lukasz Cmielowski <lcmielow@redhat.com>, Michal Steczko <msteczko@redhat.com>, Filip Komarzyniec <fkomarzy@redhat.com>
License: Apache-2.0
Project-URL: Homepage, https://github.com/IBM/ai4rag
Project-URL: Repository, https://github.com/IBM/ai4rag
Project-URL: Issues, https://github.com/IBM/ai4rag/issues
Project-URL: Documentation, https://ibm.github.io/ai4rag/
Keywords: AI,RAG,LLM,GenAI
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: POSIX :: Linux
Requires-Python: <3.14,>=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: langchain~=1.1.3
Requires-Dist: langchain_chroma~=1.1.0
Requires-Dist: langchain-text-splitters~=1.1.0
Requires-Dist: llama-stack-client~=0.7.1
Requires-Dist: openai==2.23.*
Requires-Dist: pandas==2.2.*
Requires-Dist: pydantic==2.11.*
Requires-Dist: pygam~=0.12.0
Requires-Dist: scikit-learn==1.8.*
Requires-Dist: unitxt~=1.26.1
Provides-Extra: dev
Requires-Dist: ai4rag[test]; extra == "dev"
Requires-Dist: ai4rag[code_check]; extra == "dev"
Requires-Dist: ai4rag[docs]; extra == "dev"
Requires-Dist: beautifulsoup4; extra == "dev"
Requires-Dist: dotenv; extra == "dev"
Requires-Dist: ipykernel; extra == "dev"
Requires-Dist: docling; extra == "dev"
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Requires-Dist: pytest-mock; extra == "test"
Requires-Dist: psutil; extra == "test"
Requires-Dist: nbformat; extra == "test"
Provides-Extra: code-check
Requires-Dist: pylint; extra == "code-check"
Requires-Dist: black; extra == "code-check"
Requires-Dist: isort; extra == "code-check"
Provides-Extra: docs
Requires-Dist: mkdocs~=1.6.0; extra == "docs"
Requires-Dist: mkdocs-material~=9.5; extra == "docs"
Requires-Dist: mkdocstrings[python]>=0.25.0; extra == "docs"
Requires-Dist: mkdocs-git-revision-date-localized-plugin>=1.2.0; extra == "docs"
Requires-Dist: mkdocs-minify-plugin>=0.8.0; extra == "docs"
Requires-Dist: mike>=2.0.0; extra == "docs"
Dynamic: license-file

<div align="center">

<img src="docs/icon.svg" alt="ai4rag icon" width="80" height="62"/>

# `ai4rag`
### RAG Templates Optimization Engine

![AI4RAG](https://img.shields.io/badge/AI4RAG-RAG%20Builder%20%26%20Optimizer-0F62FE?style=for-the-badge&logo=ibm&logoColor=white)
![Python](https://img.shields.io/badge/Python-3.12-3776AB?style=for-the-badge&logo=python&logoColor=white)
![Python](https://img.shields.io/badge/Python-3.13-3776AB?style=for-the-badge&logo=python&logoColor=white)

[![RAG Builder](https://img.shields.io/badge/🏗️-RAG%20Builder-10B981?style=flat-square)](#)
[![HPO](https://img.shields.io/badge/⚙️-Hyperparameter%20Optimization-F59E0B?style=flat-square)](#)
[![AutoML](https://img.shields.io/badge/🚀-AutoML%20for%20RAG-8B5CF6?style=flat-square)](#)

**Initializes RAG Templates with optimal parameters**

[Getting Started](https://ibm.github.io/ai4rag/latest/getting-started/quick-start/) • [User Guide](https://ibm.github.io/ai4rag/latest/user-guide/overview/) • [API Reference](https://ibm.github.io/ai4rag/latest/api-reference/core/experiment/) • [Development](https://ibm.github.io/ai4rag/latest/development/contributing/)

</div>

---

## 🎯 What is ai4RAG?

`ai4RAG` is an **optimization engine for RAG Templates** that is LLM and vector database provider-agnostic.
It accepts a variety of RAG Templates and a search space definition, then returns an initialized RAG Template with optimal parameter values (called a RAG Pattern).


> [!IMPORTANT]
> `ai4rag` is designed to be provider-agnostic: user may provide his own implementation for foundation model, embedding model or vector store and use them for the experiment.
> Out of the box `ai4rag` is designed to work with [Llama Stack](https://github.com/llamastack/llama-stack).
> To use the full capabilities of `ai4rag`, you'll need access to a Llama Stack server configured with at least one foundation model, one embedding model, and a vector database.

## Llama Stack

ai4RAG can run experiments using a [Llama Stack](https://github.com/llamastack/llama-stack) server for embeddings, vector storage, and text generation. Use the official client and API docs to connect and extend:

- **Client:** [llama-stack-client](https://pypi.org/project/llama-stack-client/) >= 0.7.0 (Python package used by ai4RAG; installs with this project).
- **Server:** [Llama Stack](https://github.com/llamastack/llama-stack) >= 0.7.0.
- **API reference:** [Llama Stack API docs](https://llamastack.github.io/docs/) — HTTP API used by the client.

**Features used by ai4rag**

When using the Llama Stack backend, ai4rag relies on:

- **Embeddings** — Text embeddings via the client (e.g. for indexing and query encoding). See [Embeddings API](https://llamastack.github.io/docs/api/embeddings) in the docs.
- **Vector stores** — Create, retrieve, and delete vector store instances (e.g. Milvus) with a chosen embedding model and dimension. See [Vector stores](https://llamastack.github.io/docs/api/creates-a-vector-store) in the API docs.
- **Vector IO** — Insert document chunks (with embeddings) into a store and run similarity search (query) for retrieval. See [Vector IO](https://llamastack.github.io/docs/api/search-for-chunks-in-a-vector-store) and insert/query endpoints.
- **Chat / responses** — Foundation model integration for answer generation (e.g. chat completions or responses API) when evaluating RAG patterns.


## Quick start
1. [Provide an instance of `llama-stack-client` to integrate with Llama Stack.](#prepare-llama-stack-client)
2. [Prepare your knowledge base documents for the experiment.](#prepare-knowledge-base-documents)
3. [Prepare `benchmark_data.json` with evaluation questions and answers.](#prepare-benchmark_datajson)
4. [Define and constrain your search space.](#define-and-constrain-search-space)
5. [Configure the optimizer.](#configure-optimizer)
6. [Create and run the experiment.](#run-the-experiment)


### Prepare `llama-stack-client`
To enable full integration with Llama Stack, instantiate a `LlamaStackClient`.
This allows `ai4rag` to use the models and vector stores available on your Llama Stack server.

> [!tip]
> Store your credentials securely in a `.env` file.

```python
import os
from dotenv import load_dotenv, find_dotenv
from llama_stack_client import LlamaStackClient

client = LlamaStackClient(base_url=os.getenv("BASE_URL"), api_key=os.getenv("API_KEY"))
```

### Prepare knowledge base documents
Prepare a set of documents to serve as the knowledge base for retrieval.
These documents will be used to ground the LLM's responses and should be stored in a local directory.

> [!note]
> If you are using the project locally, you can load documents using the `FileStore` class from the `dev_utils` module.
> Supported document formats can be found in the `FileStore` implementation.

```python
from pathlib import Path
from dev_utils.file_store import FileStore

documents_path = Path("<path to the documents folder>")
documents = FileStore(documents_path).load_as_documents()
```


### Prepare `benchmark_data.json`
Create a `benchmark_data.json` file following this schema:
```json
[
	{
		"question": "<question_1>",
		"correct_answers": [
			"<answer 1 for question 1>",
			"<answer 2 for question 1>"
		],
		"correct_answer_document_ids": ["<list of documents ids based on which correct answers were generated>"]
	},
	{
		"question": "<question_2>",
		"correct_answers": [
			"<answer 1 for question 2>",
			"<answer 2 for question 2>"
		],
		"correct_answer_document_ids": ["<list of documents ids based on which correct answers were generated>"]
	}
]
```

All benchmark questions and answers must be derived from your knowledge base documents.

```python
from dev_utils.utils import read_benchmark_from_json

benchmark_data_path = Path("<path to benchmark_data.json>")
benchmark_data = read_benchmark_from_json(benchmark_data_path)
```


### Define and constrain search space
The search space defines all possible parameter combinations, where each combination creates a unique RAG Pattern.
During the experiment, the engine will optimize the RAG Pattern for the selected metric over the given search space, using an objective function to evaluate each configuration.

```python
from ai4rag.search_space.src.parameter import Parameter
from ai4rag.search_space.src.search_space import AI4RAGSearchSpace
from ai4rag.rag.foundation_models.llama_stack import LSFoundationModel
from ai4rag.rag.embedding.llama_stack import LSEmbeddingModel


search_space = AI4RAGSearchSpace(
    params=[
        Parameter(
            name="foundation_model",
            param_type="C",
            values=[LSFoundationModel(model_id="ollama/llama3.2:3b", client=client)],
        ),
        Parameter(
            name="embedding_model",
            param_type="C",
            values=[
                LSEmbeddingModel(
                    model_id="ollama/nomic-embed-text:latest",
                    client=client,
                    params={"embedding_dimension": 768, "context_length": 8192},
                )
            ]
        )
    ]
)
```

> [!tip]
> To run automatic models discovery with Llama Stack you may use `prepare_search_space_with_llama_stack()` from `ai4rag.search_space.prepare_search_space`.


### Configure optimizer
You have full control over the optimization algorithm. Configure the `GAMOptimizer` by adjusting `GAMOptSettings`.

```python
from ai4rag.core.hpo.gam_opt import GAMOptSettings

optimizer_settings = GAMOptSettings(
    max_evals=10, n_random_nodes=4
)
```


### Run the experiment
Using the information from the previous steps, create an experiment and run the ai4rag optimization engine.

> [!note]
> For Llama Stack vector stores, use the `"ls_<provider_id>"` format where `<provider_id>` matches your Llama Stack
> provider configuration (e.g., `"ls_milvus"`, `"ls_qdrant"`).
> To use ChromaDB in-memory, specify `"chroma"`.

```python
from ai4rag.core.experiment.experiment import AI4RAGExperiment
from ai4rag.utils.event_handler import LocalEventHandler

experiment = AI4RAGExperiment(
    client=client,
    documents=documents,
    benchmark_data=benchmark_data,
    search_space=search_space,
    vector_store_type="ls_milvus",
    optimizer_settings=optimizer_settings,
    event_handler=LocalEventHandler(output_path="<local-path-to-store-your-output-files>"),
)

experiment.search()
best_eval = experiment.results.get_best_evaluations(k=1)[0]
print(best_eval)

print(best_eval.rag_pattern.generate("What ai4rag can be used for?"))
```

> [!tip]
> For production use, implement your own custom `EventHandler` to handle status changes and artifacts produced during the experiment.
> See the [`BaseEventHandler` implementation](http://github.com/IBM/ai4rag/blob/main/ai4rag/utils/event_handler/event_handler.py) for reference.


## Contribution
Pull requests are very welcome! Make sure your patches are well tested. Ideally create a topic branch for every separate change you make. For example:

1. Fork the repo
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Commit your changes (`git commit -am 'Added some feature'`)
4. Push to the branch (`git push origin my-new-feature`)
5. Create new Pull Request

See more details in [contributing section](contributing.md).
