Metadata-Version: 2.1
Name: unibox
Version: 0.5.2
Summary: unibox provides unified interface for common file operations
Author-Email: trojblue <trojblue@gmail.com>
License: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Documentation
Classifier: Topic :: Software Development
Classifier: Topic :: Utilities
Classifier: Typing :: Typed
Project-URL: Homepage, https://trojblue.github.io/unibox
Project-URL: Documentation, https://trojblue.github.io/unibox
Project-URL: Changelog, https://trojblue.github.io/unibox/changelog
Project-URL: Repository, https://github.com/trojblue/unibox
Project-URL: Issues, https://github.com/trojblue/unibox/issues
Project-URL: Discussions, https://github.com/trojblue/unibox/discussions
Project-URL: Gitter, https://gitter.im/unibox/community
Project-URL: Funding, https://github.com/sponsors/trojblue
Requires-Python: >=3.10
Requires-Dist: boto3>=1.35.91
Requires-Dist: colorama>=0.4.6
Requires-Dist: colorlog>=6.9.0
Requires-Dist: datasets>=3.2.0
Requires-Dist: orjson>=3.10.13
Requires-Dist: pandas[parquet]>=2.2.3
Requires-Dist: pillow>=11.1.0
Requires-Dist: tqdm>=4.67.1
Description-Content-Type: text/markdown

# unibox

[![ci](https://github.com/trojblue/unibox/workflows/ci/badge.svg)](https://github.com/trojblue/unibox/actions?query=workflow%3Aci)
[![documentation](https://img.shields.io/badge/docs-mkdocs-708FCC.svg?style=flat)](https://trojblue.github.io/unibox/)
[![pypi version](https://img.shields.io/pypi/v/unibox.svg)](https://pypi.org/project/unibox/)
[![gitter](https://badges.gitter.im/join%20chat.svg)](https://app.gitter.im/#/room/#unibox:gitter.im)

unibox provides unified interface for common file operations

## Installation

```bash
pip install unibox
```

With [`uv`](https://docs.astral.sh/uv/):

```bash
uv tool install unibox
```

If you're not using python 3.13, it's also recommended to install `pandas[performance]`:

```bash
pip install "pandas[performance]"
```


to update or remove project dependencies:

```bash

uv add requests

uv remove requests

# after adding new package: rerun
make setup
```


## Usage

import the lib:

```python
import unibox as ub
```


## Using Huggingface Backend

you can load and use a huggingface dataset directly with `hf://{username}/{daataset_repo}`:

```python
hf_dset = ub.loads("hf://incantor/aesthetic_eagle_5category_iter99")
df = hf_dset.to_pandas()
```

and upload a processed dataframe back to huggingface:

```python
df["new_col"] = "new changes"
ub.saves(df, "hf://datatmp/updated_repo")
```


## Dev notes

current concerns:

1. loads(): temp files could accumulate on global dir, and take up all of /tmp/; also concurrency issues
2. s3_backend: only one that takes a dir; should make others do the same

to get a coverage report, run:
```bash
pytest --cov=src/unibox --cov-report=term-missing tests
```

To build the docs:

```bash
make docs host=0.0.0.0

# or in debug mode:
make check-docs
```

to manual make a release:
```bash

```


migrating from unibox 0.4

no longer supported:

- `ub.traverses()`: removed handlers and `exclude_extensions` (`include_extensions` still works but depreciated with `exts`)