Metadata-Version: 2.4
Name: kara-toolkit
Version: 0.1.0
Summary: Knowledge-Aware Re-embedding Algorithm - Efficient RAG knowledge base updates
Author-email: Mahdi Zakizadeh <mzakizadeh.me@gmail.com>
License: CC-BY License
Project-URL: Homepage, https://github.com/mzakizadeh/kara
Project-URL: Repository, https://github.com/mzakizadeh/kara
Project-URL: Documentation, https://kara-toolkit.readthedocs.io
Project-URL: Bug Tracker, https://github.com/mzakizadeh/kara-toolkit/issues
Keywords: rag,embeddings,knowledge-base,nlp,langchain,llamaindex
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.21.0
Requires-Dist: typing-extensions>=4.0.0
Provides-Extra: langchain
Requires-Dist: langchain>=0.1.0; extra == "langchain"
Requires-Dist: langchain_community>=0.0.1; extra == "langchain"
Requires-Dist: langchain_core>=0.0.1; extra == "langchain"
Requires-Dist: langchain_text_splitters>=0.0.1; extra == "langchain"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: pre-commit>=3.0.0; extra == "dev"
Provides-Extra: all
Requires-Dist: kara-toolkit[dev,langchain]; extra == "all"
Dynamic: license-file

# KARA - Knowledge-Aware Re-embedding Algorithm

[![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-blue.svg)](https://creativecommons.org/licenses/by/4.0/)
[![PyPI version](https://badge.fury.io/py/kara-toolkit.svg)](https://badge.fury.io/py/kara-toolkit)
[![Python Support](https://img.shields.io/pypi/pyversions/kara-toolkit.svg)](https://pypi.org/project/kara-toolkit/)

KARA is a Python library for efficient document updates in RAG systems. It minimizes embedding operations by intelligently reusing existing chunks when documents are updated.

## Installation

```bash
pip install kara-toolkit
```

## Quick Start

```python
from kara import KARAUpdater, RecursiveCharacterChunker

# Initialize
updater = KARAUpdater(
    chunker=RecursiveCharacterChunker(chunk_size=1000),
    epsilon=0.1
)

# Process initial documents
updater.initialize(["Your document content..."])

# Update with new content
result = updater.update(["Updated document content..."])
print(f"Efficiency: {result.efficiency_ratio:.1%}")
```

## How It Works

KARA formulates the chunking problem as a DAG (Directed Acyclic Graph) for a single document where each node represents a position in the document splits, and edges represent possible chunks. It then uses Dijkstra's algorithm to find the optimal chunking path.

## Examples

See the [`examples/`](examples/) directory for more usage examples.

## License

CC BY 4.0 License - see [LICENSE](LICENSE) file for details.
