Metadata-Version: 2.4
Name: chATLAS_Chains
Version: 0.1.7
Summary: A modular Python package for implementing Retrieval Augmented Generation chains for the chATLAS project.
Author-email: Joe Egan <joseph.caimin.egan@cern.ch>
License: Apache-2.0
Project-URL: Homepage, https://gitlab.cern.ch/atlasml/chatlas/chatlas-packages/
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: chatlas-embed>=0.1.19
Requires-Dist: langchain~=0.3.3
Requires-Dist: langchain_core
Requires-Dist: langchain_openai
Requires-Dist: langgraph
Requires-Dist: sentence-transformers>=3.0.0
Requires-Dist: tiktoken
Requires-Dist: pinecone
Requires-Dist: chatlas-embed
Requires-Dist: psycopg2-binary>=2.9.10
Requires-Dist: httpx[socks]>=0.28.1
Dynamic: license-file


# chATLAS_Chains

This package implements and benchmarks various Retrieval Augmented Generation (RAG) chains for use in the [chATLAS](https://chatlas-flask-chatlas.app.cern.ch) project.

## Installation

### From PyPI

```bash
pip install chATLAS-Chains
```

### From source

We recommend using [`uv`](https://docs.astral.sh/uv/)
```bash
cd chATLAS_Chains
uv sync
```

## Environment variables

These are required for the following use cases

1. Using an OpenAI LLM
```bash
export CHATLAS_OPENAI_KEY="your api key"
```

2. Using LLMs via the Groq API
```bash
export CHATLAS_GROQ_BASE_URL="http://cs-513-ml003:3000"
export CHATLAS_GROQ_KEY="your groq api key"
```

**note** The API address is local to the CERN network. If not at CERN, you can forward it like so:
```bash
ssh -L 3000:cs-513-ml003:3000 $LXPLUS_USERNAME@lxplus.cern.ch
export CHATLAS_GROQ_BASE_URL="http://localhost:3000"
```

3. Using LLMs via CERN's LiteLLM API, here is the [repo](https://gitlab.cern.ch/itgpt/litellm-okd/-/tree/main) and some [setup instructions](https://codimd.web.cern.ch/tQKiMa13Q4O-EJXWTO3N7w?view#Using-Your-Dedicated-API-Key-to-Access-LLMs) for reference.
```bash
export CHATLAS_CHAINS_LITELLM_KEY="your litellm key"
```

## Supported Chains

More details [here](chATLAS_Chains/chains/README.md)

- `chains.basic.basic_retrieval_chain`
- `chains.advanced.advanced_rag`

### Model Configuration in Chains

Supported chain constructors now accept a typed `chat_model_kwargs` argument for model options (for example:
`temperature`, `max_tokens`, `service_provider`, `api_key`, `base_url`, `proxy`).

```python
from chATLAS_Chains.chains.basic import basic_retrieval_chain

chain = basic_retrieval_chain(
    prompt=...,
    vectorstore=...,
    model_name="gpt-4o-mini",
    chat_model_kwargs={"temperature": 0.1, "max_tokens": 512},
)
```

## Forwarding vectorstore connections

If not on the CERN network, you can forward the connection to the postgres servers with:

```bash
ssh -N \         
  -L 6624:dbod-chatlas.cern.ch:6624 \
  -L 6606:dbod-chatlas-cds.cern.ch:6606 \
  "$LXPLUS_USERNAME"@lxplus.cern.ch 
export CHATLAS_PORT_FORWARDING=1
```

You can then the helper function [`get_vectorstore`](chATLAS_Chains/vectorstore.py)

## Testing Environment Variables

Some tests are DB-backed integration tests (`tests/test_chains.py`, `tests/test_conversational.py`, `tests/test_search.py`).
If the DB/test environment is not configured, these tests are skipped by `tests/conftest.py`.

`tests/conftest.py` now uses explicit controls:

- `CHATLAS_PORT_FORWARDING`: enable localhost DB tunnels (`1`, `true`, `True`)
- `CHATLAS_DB_PASSWORD`

### Local Example (with DB tunnels)

```bash
export CHATLAS_DB_PASSWORD="..."
export CHATLAS_PORT_FORWARDING=1
unset GITLAB_PAT

uv run pytest -q
```

## Postgres

If you want to create a local postgres server, you need to install `psql`. Some instructions to do this on macOS using [homebrew](https://brew.sh) are here:

Software install
```bash
brew install postgresql
brew services start postgresql
brew install pgvector
brew unlink pgvector && brew link pgvector
```

Create a user
```bash
psql -h localhost -U postgres
ALTER USER postgres WITH PASSWORD 'Set_your_password_here';
CREATE EXTENSION IF NOT EXISTS vector;
```
## CHANGELOG

#### 0.1.7

Support for CERN-hosted LiteLLM models

Multi-turn conversational RAG with (local) conversation history 

Bugfixes

#### 0.1.6

Fix bug in `reciprocal_rank_fusion` which caused it to silently return only one document

Add `fallback_models` optional argument to `advanced_rag`

#### 0.1.5

Fix missing `retry_config` argument in `advanced_rag` caused by early PyPI upload

#### 0.1.4

Support for Groq-hosted models

Some new functions that go beyond the "basic RAG" workflow:
- Reciprocal Rerank Fusion `chATLAS_Chains.documents.rrf.reciprocal_rank_fusion`
- Document reranking via the Pinecone API `chATLAS_Chains.documents.rerank.rerank_documents`
- Query rewriting step `chATLAS_Chains.query.query_rewriting.rewrite_query`

These are all usable via the new chain `chATLAS_Chains.chains.advanced.advanced_rag`

Added unit tests to gitlab CI/CD pipeline

#### 0.1.3

Fixing imports

Changed output format of `basic_retrieval_chain` (`docs` key is now a list of `Document` objects, rather than a dict)

Unit tests for `basic_retrieval_chain`

#### 0.1.2

Unit tests

First Langgraph chain

#### 0.1.1

Initial Release

---
## 📄 License

chATLAS_Benchmark is released under Apache v2.0 license.

---

<div align="center">

**Made with ❤️ by the ATLAS Collaboration**

*For questions and support, please [contact](mailto:joseph.caimin.egan@cern.ch)*

</div>
