Metadata-Version: 2.4
Name: llama-index-node-parser-chonkie
Version: 0.1.1
Summary: llama-index Chonkie integration
Project-URL: Homepage, https://llamaindex.ai
Project-URL: Repository, https://github.com/run-llama/llama_index/tree/main/llama-index-integrations/node_parser/llama-index-node-parser-chonkie
Author-email: Hafedh Hichri <hhichri60@gmail.com>
License-Expression: MIT
Keywords: chonkie,chunking,llama-index,node-parser
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: chonkie[code,model2vec,neural,openai,st]>=1.5.0
Requires-Dist: llama-index-core<0.15,>=0.13.0
Description-Content-Type: text/markdown

# LlamaIndex Node Parser Chonkie Integration

This package provides an integration between [LlamaIndex](https://www.llamaindex.ai/) and [Chonkie](https://github.com/chonkie-inc/chonkie), a powerful and flexible chunking library.

## Installation

```bash
pip install llama-index-node_parser-chonkie
```

## Quick Start

```python
from llama_index.core import Document
from llama_index.node_parser.chonkie import Chunker

# Create a chunker (defaults to 'recursive')
chunker = Chunker(chunk_size=512)

# Create a document
doc = Document(text="Your long text here...")

# Get nodes
nodes = chunker.get_nodes_from_documents([doc])
```

## Supported Chunkers

The `Chunker` acts as a wrapper for various Chonkie chunking strategies. You can specify the strategy using the `chunker` parameter:

| `chunker`   | Description                                                           |
| ----------- | --------------------------------------------------------------------- |
| `recursive` | (Default) Recursively splits text based on a hierarchy of separators. |
| `sentence`  | Splits text into sentences.                                           |
| `token`     | Splits text into chunks based on token counts.                        |
| `word`      | Splits text based on word counts.                                     |
| `semantic`  | Splits text based on semantic similarity.                             |
| `late`      | Late chunking strategy.                                               |
| `neural`    | Neural-based chunking.                                                |
| `code`      | Optimized for source code.                                            |
| `fast`      | High-performance basic chunking.                                      |

run the following code to see the full list of valid aliases:

```python
from llama_index.node_parser import Chunker

print(Chunker.valid_chunkers)
```

## Advanced Configuration

You can pass any keyword arguments accepted by the underlying Chonkie chunker directly to `Chunker`:

```python
chunker = Chunker(
    chunker="semantic",
    chunk_size=512,
    embedding_model="all-MiniLM-L6-v2",
    threshold=0.5,
)
```

## Integration with Node Parsing

You can use `Chunker` directly to parse documents into nodes:

```python
from llama_index.core import Document
from llama_index.node_parser.chonkie import Chunker

chunker = Chunker(chunk_size=512)
doc = Document(text="Your long text here...")
nodes = chunker.get_nodes_from_documents([doc])
```

or you can also use it as a component within the Ingestion pipeline:

```python
from llama_index.core import Document
from llama_index.core.ingestion import IngestionPipeline
from llama_index.node_parser.chonkie import Chunker

pipeline = IngestionPipeline(
    transformations=[
        Chunker("recursive", chunk_size=512),
        # ... other transformations
    ]
)

nodes = pipeline.run(documents=[Document.example()])
```
