Metadata-Version: 2.3
Name: datasets-dump
Version: 0.1.3
Summary: A tool for dumping datasets from the Hugging Face datasets library
Project-URL: homepage, https://github.com/JacobLinCool/datasets-dump
Project-URL: repository, https://github.com/JacobLinCool/datasets-dump
Author-email: Jacob Lin <jacoblincool@gmail.com>
License: MIT
Keywords: datasets
Requires-Python: >=3.10
Requires-Dist: datasets>=3.1.0
Requires-Dist: numpy
Requires-Dist: pillow
Requires-Dist: soundfile
Requires-Dist: tqdm
Description-Content-Type: text/markdown

# datasets-dump

Dump embedded datasets to audio folder or images folder.

Get the audio folder / image folder back from parquet files.

![usage](./images/usage.jpg)

## Usage

```bash
datasets-dump someone/dataset ./dist
```

Python API:

```python
def dump(
    dataset: Union[str, Dataset, DatasetDict],
    dist: str | Path,
    audio_column: Optional[str] = None,
    image_column: Optional[str] = None,
    metadata_format: Literal["jsonl", "csv"] = "jsonl",
) -> None
```
