Metadata-Version: 2.4
Name: llama-index-readers-datasets
Version: 0.2.0
Summary: llama-index readers HuggingFace Datasets integration
Author-email: Florian Bacho <f.bacho@outlook.fr>
Maintainer: Florian-BACHO
License-Expression: MIT
License-File: LICENSE
Keywords: datasets,huggingface
Requires-Python: <4.0,>=3.10
Requires-Dist: datasets>=4.4.2
Requires-Dist: llama-index-core<0.15,>=0.13.0
Description-Content-Type: text/markdown

# LlamaIndex Readers Integration: HuggingFace Datasets

## Overview

HuggingFace Datasets Reader is a tool designed to load HuggingFace datasets as documents.

### Installation

You can install HuggingFace Datasets Reader via pip:

```bash
pip install llama-index-readers-datasets
```

## Usage

```python
from llama_index.readers.datasets import DatasetsReader
from datasets import load_dataset

reader = DatasetsReader()

# Load train split (default) as metadata
docs = reader.load_data("lhoestq/demo1")

# Load test split as metadata
docs = reader.load_data("lhoestq/demo1", split="test")

# Load specify the dictionary key to use as text value
docs = reader.load_data("lhoestq/demo1", text_key="review")

# Pass additional arguments to datasets.load_dataset
docs = reader.load_data("lhoestq/demo1", cache_dir="/tmp/huggingface")

# Load from a preloaded dataset (ignore all other arguments)
dataset = load_dataset("lhoestq/demo1", split="train")
docs = reader.load_data(dataset=dataset)

# Lazy loading (stream samples)
for it in reader.lazy_load_data(
    "lhoestq/demo1", split="test", text_key="review", doc_id_key="id"
):
    print(it)
```
