Metadata-Version: 2.4
Name: ecdallm
Version: 0.1.3
Summary: Retrieval-Augmented Generation - (RAG)
License: MIT
License-File: LICENSE
Keywords: rag,llm,nlp,data,vector
Author: EDC - Erasmus Data Collaboratory
Author-email: admin@ecda.ai
Requires-Python: >=3.11,<3.15
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Dist: chromadb (>=1.5.0,<2.0.0)
Requires-Dist: docx2txt (>=0.9,<0.10)
Requires-Dist: faiss-cpu (>=1.13.2,<2.0.0)
Requires-Dist: fastapi (>=0.128.5,<0.129.0)
Requires-Dist: fastembed (>=0.7.4,<0.8.0)
Requires-Dist: jinja2 (>=3.1.6,<4.0.0)
Requires-Dist: langchain (>=1.2.9,<2.0.0)
Requires-Dist: langchain-chroma (>=1.1.0,<2.0.0)
Requires-Dist: langchain-community (>=0.4.1,<0.5.0)
Requires-Dist: langchain-openai (>=1.1.8,<2.0.0)
Requires-Dist: langchain-text-splitters (>=1.1.0,<2.0.0)
Requires-Dist: openai (>=2.18.0,<3.0.0)
Requires-Dist: pydantic (>=2.12.5,<3.0.0)
Requires-Dist: pypdf (>=6.7.0,<7.0.0)
Requires-Dist: pytest (>=9.0.2,<10.0.0)
Requires-Dist: python-docx (>=1.2.0,<2.0.0)
Requires-Dist: python-multipart (>=0.0.22,<0.0.23)
Requires-Dist: reportlab (>=4.4.9,<5.0.0)
Requires-Dist: scikit-learn (>=1.8.0,<2.0.0)
Requires-Dist: sentence-transformers (>=5.2.2,<6.0.0)
Requires-Dist: typer (>=0.21.1,<0.22.0)
Requires-Dist: uvicorn (>=0.40.0,<0.41.0)
Project-URL: Homepage, https://ecda.eur.nl/erasmus-data-collaboratory/
Project-URL: Repository, https://github.com/Erasmus-Data-Collaboratory/ecdallm
Description-Content-Type: text/markdown

# ecdallm



[![PyPI version](https://badge.fury.io/py/ecdallm.svg)](https://pypi.org/project/ecdallm/)
[![Python versions](https://img.shields.io/pypi/pyversions/ecdallm.svg)](https://pypi.org/project/ecdallm/)
[![License](https://img.shields.io/pypi/l/ecdallm.svg)](LICENCE)

**ecdallm** is a lightweight Retrieval-Augmented Generation (RAG) application that lets you chat with your own documents using a locally running LLM.

It combines:

- FastAPI web interface
- Local embedding pipeline (FastEmbed)
- Persistent vector storage (ChromaDB)
- Document ingestion (PDF, TXT, DOCX)
- CLI launcher
- Local LLM support (e.g., LM Studio)

The goal is to provide a simple, reproducible environment for **document-grounded LLM interaction** without relying on external APIs.

---

## Overview

`ecdallm` allows you to:

1. Upload documents
2. Index them into a vector database
3. Run semantic retrieval
4. Query a local LLM with grounded context

All embeddings and vector storage run locally.

Your LLM also runs locally — for example using **LM Studio**.

This makes the system suitable for:

- research environments
- private document analysis
- offline experimentation
- RAG prototyping

---

## Installation

Install from PyPI:

```bash
pip install ecdallm
```

---

## Running the application

Start the CLI:

```bash
ecdallm
```

The CLI will:

- find a free port (starting from 8000)
- start the FastAPI server
- open the browser automatically

Example output:

```
ecdallm running at http://127.0.0.1:8000/
INFO: Uvicorn running on http://127.0.0.1:8000
```

---

## Using a local LLM

`ecdallm` expects an OpenAI-compatible endpoint.

For example, with **LM Studio**:

1. Start LM Studio server
2. Load a chat model
3. Enable the local API server

Typical endpoint:

```
http://localhost:1234/v1
```

You will configure this from the web interface.

---

## Supported document types

- PDF
- TXT
- DOCX

---

## Workflow

### 1. Upload documents
Use the **Upload** page to add files.

### 2. Index documents
Files are automatically indexed into ChromaDB using FastEmbed.

### 3. Chat with documents
Open the **Chat** page and ask questions.

The assistant will:

- retrieve relevant chunks
- build a grounded prompt
- query the local LLM
- return a concise answer

---

## Project structure

```
ecdallm/
├── cli.py
└── app/
    ├── main.py
    ├── rag.py
    ├── paths.py
    ├── search_engine.py
    ├── vector.py
    ├── templates/
    ├── static/
    ├── uploads/
    └── rag_store/
```

---

## Notes

At this stage, `ecdallm` is designed to work with **locally running LLMs**.
No external API calls are required for embeddings or retrieval.

This keeps the system:

- private
- reproducible
- offline-friendly

---



## Erasmus Data Collaboratory

Developed by the Erasmus Data Collaboratory (ECDA).
- Zaman Ziabakhshganji — creator and maintainer
- Farshad Radman — co-author and contributor
- Jos van Dongen — co-author and contributor


---

## License

MIT License

