Metadata-Version: 2.3
Name: simba-core
Version: 0.3.0
Summary: Knowledge Management System that connects to your RAG system
License: LICENCE.md
Keywords: rag,knowledge-management,vector-store,llm
Author: GitHamza0206
Author-email: zeroualihamza0206@gmail.com
Requires-Python: >=3.11,<3.13
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: celery (>=5.4.0,<6.0.0)
Requires-Dist: docling (>=2.15.1,<3.0.0)
Requires-Dist: faiss-cpu (>=1.9.0.post1,<2.0.0)
Requires-Dist: fastapi (>=0.115.6,<0.116.0)
Requires-Dist: ipykernel (>=6.29.5,<7.0.0)
Requires-Dist: langchain (>=0.3.13,<0.4.0)
Requires-Dist: langchain-chroma (>=0.1.4,<0.2.0)
Requires-Dist: langchain-community (>=0.3.13,<0.4.0)
Requires-Dist: langchain-docling (>=0.2.0,<0.3.0)
Requires-Dist: langchain-experimental (>=0.3.4,<0.4.0)
Requires-Dist: langchain-huggingface (>=0.1.2,<0.2.0)
Requires-Dist: langchain-milvus (>=0.1.8,<0.2.0)
Requires-Dist: langchain-ollama (>=0.2.2,<0.3.0)
Requires-Dist: langchain-openai (>=0.2.14,<0.3.0)
Requires-Dist: langgraph (>=0.2.60,<0.3.0)
Requires-Dist: litellm (>=1.56.6,<2.0.0)
Requires-Dist: markdown (>=3.7,<4.0)
Requires-Dist: markitdown (>=0.0.1a3,<0.0.2)
Requires-Dist: ollama (>=0.4.6,<0.5.0)
Requires-Dist: pdf2image (>=1.17.0,<2.0.0)
Requires-Dist: pdfminer-six (>=20240706,<20240707)
Requires-Dist: pillow-heif (>=0.21.0,<0.22.0)
Requires-Dist: pydantic-settings (>=2.7.1,<3.0.0)
Requires-Dist: pymupdf (>=1.25.2,<2.0.0)
Requires-Dist: pytest (>=8.3.4,<9.0.0)
Requires-Dist: pytest-asyncio (>=0.25.2,<0.26.0)
Requires-Dist: python-dotenv (>=1.0.1,<2.0.0)
Requires-Dist: python-multipart (>=0.0.20,<0.0.21)
Requires-Dist: rank-bm25 (>=0.2.2,<0.3.0)
Requires-Dist: rq (>=2.1.0,<3.0.0)
Requires-Dist: sentence-transformers (>=3.4.1,<4.0.0)
Requires-Dist: tinydb (>=4.8.2,<5.0.0)
Requires-Dist: torch (>=2.6.0,<3.0.0)
Requires-Dist: torchaudio (>=2.6.0,<3.0.0)
Requires-Dist: torchvision (>=0.21.0,<0.22.0)
Requires-Dist: unstructured[local-inference] (>=0.16.12,<0.17.0)
Requires-Dist: uvicorn (>=0.34.0,<0.35.0)
Project-URL: Repository, https://github.com/GitHamza0206/simba
Description-Content-Type: text/markdown

<h1 align="center">Simba - Your Knowledge Management System</h1>

<p align="center">
<img src="/assets/logo.png" alt="Simba Logo" width="400" height="400"/>
</p>

<p align="center">
<strong>Connect your knowledge to any RAG system</strong>
</p>

<p align="center">
<a href="https://www.producthunt.com/posts/simba-2?embed=true&utm_source=badge-featured&utm_medium=badge&utm_souce=badge-simba&#0045;2" target="_blank"><img src="https://api.producthunt.com/widgets/embed-image/v1/featured.svg?post_id=863851&theme=light&t=1739449352356" alt="Simba&#0032; - Connect&#0032;your&#0032;Knowledge&#0032;into&#0032;any&#0032;RAG&#0032;based&#0032;system | Product Hunt" style="width: 250px; height: 54px;" width="250" height="54" /></a>
</p>

<p align="center">


<a href="https://github.com/GitHamza0206/simba/blob/main/LICENSE">
<img src="https://img.shields.io/github/license/GitHamza0206/simba" alt="License">
</a>
<a href="https://github.com/GitHamza0206/simba/stargazers">
<img src="https://img.shields.io/github/stars/GitHamza0206/simba" alt="Stars">
</a>
<a href="https://github.com/GitHamza0206/simba/network/members">
<img src="https://img.shields.io/github/forks/GitHamza0206/simba" alt="Forks">
</a>
<a href="https://github.com/GitHamza0206/simba/issues">
<img src="https://img.shields.io/github/issues/GitHamza0206/simba" alt="Issues">
</a>
<a href="https://github.com/GitHamza0206/simba/pulls">
<img src="https://img.shields.io/github/issues-pr/GitHamza0206/simba" alt="Pull Requests">
</a>
<a href="https://pepy.tech/projects/simba-core"><img src="https://static.pepy.tech/badge/simba-core" alt="PyPI Downloads"></a>
</p>

<!-- <a href="https://ibb.co/RHkRGcs"><img src="https://i.ibb.co/ryRDKHz/logo.jpg" alt="logo" border="0"></a> -->
[![Twitter Follow](https://img.shields.io/twitter/follow/zeroualihamza?style=social)](https://x.com/zerou_hamza)

## 📖 Overview

Simba is an open-source, portable Knowledge Management System (KMS) designed specifically for seamless integration with Retrieval-Augmented Generation (RAG) systems. With its intuitive UI, modular architecture, and powerful SDK, Simba simplifies knowledge management, allowing developers to focus on building advanced AI solutions.

# Table of Contents

- [Table of Contents](#table-of-contents)
  - [🚀 Features](#-features)
  - [🎥 Demo](#-demo)
  - [🛠️ Getting Started](#️-getting-started)
    - [📋 Prerequisites](#-prerequisites)
  - [🔌 Quickstart Simba SDK Usage](#-quickstart-simba-sdk-usage)
    - [📦 Installation](#-installation)
    - [🔑 Configuration](#-configuration)
    - [🚀 Running Simba](#-running-simba)
  - [🐳 Docker Deployment](#-docker-deployment)
  - [🏁 Roadmap](#-roadmap)
  - [🤝 Contributing](#-contributing)
  - [💬 Support \& Contact](#-support--contact)

## 🚀 Features

- **🔌 Powerful SDK:** Comprehensive Python SDK for easy integration.
- **🧩 Modular Architecture:** Flexible integration of vector stores, embedding models, chunkers, and parsers.
- **🖥️ Modern UI:** User-friendly interface for managing document chunks.
- **🔗 Seamless Integration:** Effortlessly connects with any RAG-based system.
- **👨‍💻 Developer-Centric:** Simplifies complex knowledge management tasks.
- **📦 Open Source & Extensible:** Community-driven with extensive customization options.

## 🎥 Demo

![Watch the demo](/assets/demo.gif)

## 🛠️ Getting Started

### 📋 Prerequisites

Ensure you have the following installed:

- [Python](https://www.python.org/) 3.11+
- [Poetry](https://python-poetry.org/)
- [Redis](https://redis.io/) 7.0+
- [Node.js](https://nodejs.org/) 20+
- [Git](https://git-scm.com/)
- (Optional) Docker

## 🔌 Quickstart Simba SDK Usage

```bash
pip install simba-client
```

Leverage Simba's SDK for powerful programmatic access:

```python
from simba_sdk import SimbaClient

client = SimbaClient(api_url="http://localhost:8000") # you need to install simba-core and run simba server first 

document = client.documents.create(file_path="path/to/your/document.pdf")
document_id = document[0]["id"]

parsing_result = client.parser.parse_document(document_id, parser="docling", sync=True)

retrieval_results = client.retriever.retrieve(query="your-query")

for result in retrieval_results["documents"]:
    print(f"Content: {result['page_content']}")
    print(f"Metadata: {result['metadata']['source']}")
    print("====" * 10)
```

Explore more in the [Simba SDK documentation](https://github.com/GitHamza0206/simba/tree/main/simba_sdk).

### 📦 Installation

Install Simba core :

```bash
pip install simba-core
```

Or Clone and set up the repository:

```bash
git clone https://github.com/GitHamza0206/simba.git
cd simba
poetry config virtualenvs.in-project true
poetry install
source .venv/bin/activate
```

### 🔑 Configuration

Create a `.env` file:

```bash
OPENAI_API_KEY=your_openai_api_key
REDIS_HOST=localhost
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/1
```

Configure `config.yaml`:

```yaml
# config.yaml

project:
  name: "Simba"
  version: "1.0.0"
  api_version: "/api/v1"

paths:
  base_dir: null  # Will be set programmatically
  faiss_index_dir: "vector_stores/faiss_index"
  vector_store_dir: "vector_stores"

llm:
  provider: "openai"
  model_name: "gpt-4o-mini"
  temperature: 0.0
  max_tokens: null
  streaming: true
  additional_params: {}

embedding:
  provider: "huggingface"
  model_name: "BAAI/bge-base-en-v1.5"
  device: "mps"  # Changed from mps to cpu for container compatibility
  additional_params: {}

vector_store:
  provider: "faiss"
  collection_name: "simba_collection"

  additional_params: {}

chunking:
  chunk_size: 512
  chunk_overlap: 200

retrieval:
  method: "hybrid" # Options: default, semantic, keyword, hybrid, ensemble, reranked
  k: 5
  # Method-specific parameters
  params:
    # Semantic retrieval parameters
    score_threshold: 0.5
    
    # Hybrid retrieval parameters
    prioritize_semantic: true
    
    # Ensemble retrieval parameters
    weights: [0.7, 0.3]  # Weights for semantic and keyword retrievers
    
    # Reranking parameters
    reranker_model: colbert
    reranker_threshold: 0.7

# Database configuration
database:
  provider: litedb # Options: litedb, sqlite
  additional_params: {}

celery: 
  broker_url: ${CELERY_BROKER_URL:-redis://redis:6379/0}
  result_backend: ${CELERY_RESULT_BACKEND:-redis://redis:6379/1}
```

### 🚀 Running Simba

Start the server, frontend, and parsers:

```bash
simba server
simba front
simba parsers
```

## 🐳 Docker Deployment

Deploy Simba using Docker:

- **CPU:**
```bash
DEVICE=cpu make build
DEVICE=cpu make up
```

- **NVIDIA GPU:**
```bash
DEVICE=cuda make build
DEVICE=cuda make up
```

- **Apple Silicon:**
```bash
DEVICE=cpu make build
DEVICE=cpu make up
```

## 🏁 Roadmap

- [x] 💻 pip install simba-core
- [x] 🔧 pip install simba-sdk
- [ ] 🌐 www.simba-docs.com
- [ ] 🔒 Auth & access management
- [ ] 🕸️ Web scraping
- [ ] ☁️ Cloud integrations (Azure/AWS/GCP)
- [ ] 📚 Additional parsers and chunkers
- [ ] 🎨 Enhanced UX/UI

## 🤝 Contributing

We welcome contributions! Follow these steps:

- Fork the repository
- Create a feature or bugfix branch
- Commit clearly documented changes
- Submit a pull request

## 💬 Support & Contact

For support or inquiries, open an issue on GitHub or contact [Hamza Zerouali](mailto:zeroualihamza0206@gmail.com).
