Metadata-Version: 2.4
Name: hf-storage
Version: 0.1.0
Summary: Hugging Face dataset-backed cloud file storage library
Author: HuggingFaceStorage
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: huggingface_hub<1.0.0,>=0.30.2
Requires-Dist: pydantic<3.0.0,>=2.11.4
Requires-Dist: typing-extensions<5.0.0,>=4.13.2
Provides-Extra: dev
Requires-Dist: pytest<9.0.0,>=8.3.5; extra == "dev"
Requires-Dist: pytest-mock<4.0.0,>=3.14.0; extra == "dev"

# HuggingFaceStorage

Python library for cloud-style file storage backed by a private Hugging Face **dataset** repository.

## Features

- Immutable file version history per logical remote path
- Soft delete via tombstone versions
- Content-addressed blob storage (`sha256`) to avoid duplicate uploads
- `HF_TOKEN`-based authentication
- Public API: `put`, `put_zip`, `get`, `list`, `delete`, `history`

## Project Structure

```text
HuggingFaceStorage/
  src/hf_storage/
  tests/unit/
  tests/integration/
  requirements.txt
  pyproject.toml
```

## Setup

1. Create the virtual environment (Python 3.11):

```powershell
py -3.11 -m venv .venv
```

2. Activate:

```powershell
.\.venv\Scripts\Activate.ps1
```

3. Install dependencies:

```powershell
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python -m pip install -e .
```

4. Deactivate when finished:

```powershell
deactivate
```

## Authentication

Set your Hugging Face token before using the library:

```powershell
$env:HF_TOKEN = "hf_xxx"
```

## Quick Example

```python
from hf_storage import HFStorage, StorageConfig

storage = HFStorage(StorageConfig(repo_id="your-namespace/your-private-dataset"))
storage.setup(create_if_missing=True, private=True)

version = storage.put("local.txt", "docs/local.txt")
zip_version = storage.put_zip("my_folder", "archives/my_folder")
storage.get("docs/local.txt", "restored.txt")
entries = storage.list(prefix="docs/")
history = storage.history("docs/local.txt")
deleted = storage.delete("docs/local.txt")
```

## Running Tests

Unit tests:

```powershell
pytest tests/unit
```

Integration tests (real HF repo):

```powershell
$env:HF_STORAGE_INTEGRATION = "1"
$env:HF_STORAGE_TEST_REPO = "your-namespace/your-private-dataset"
pytest tests/integration
```

## Publishing (Maintainers)

`publish.bat` is maintainer tooling for package release workflow only. It is not part of the public runtime API.

Set Twine credentials via environment variables:

```powershell
$env:TWINE_USERNAME = "__token__"
$env:TWINE_TEST_PASSWORD = "pypi-<testpypi-token>"
$env:TWINE_PASSWORD = "pypi-<pypi-production-token>"
```

Default release target is TestPyPI:

```powershell
.\publish.bat
```

Publish to production PyPI explicitly:

```powershell
.\publish.bat pypi
```

What `publish.bat` does:
- Runs unit tests (`pytest tests/unit`)
- Builds wheel + sdist (`python -m build`)
- Validates artifacts (`twine check dist/*`)
- Uploads to TestPyPI by default, or PyPI when `pypi` is passed

Credential behavior:
- `.\publish.bat` (default TestPyPI) uses `TWINE_TEST_PASSWORD`
- `.\publish.bat pypi` (production) uses `TWINE_PASSWORD`
- `TWINE_USERNAME` must be `__token__` for both

Important: bump package version before each release. PyPI/TestPyPI do not allow re-uploading the same version.

## One-command Zip Upload (Windows)

Use the batch wrapper to zip and upload a file or directory:

```powershell
.\put_zip.bat "C:\path\to\folder_or_file" "backups/my_archive"
```

Notes:
- `HF_TOKEN` and `HF_STORAGE_REPO_ID` are read from `.env` (or current env vars).
- If the remote path does not end with `.zip`, `.zip` is appended automatically.

## List and Download (Windows)

List stored logical paths:

```powershell
.\list_files.bat
```

Include soft-deleted entries too:

```powershell
.\list_files.bat 1
```

Download latest version by logical path:

```powershell
.\get_file.bat "backup/venv.zip" ".\downloads\venv.zip"
```

Download a specific version:

```powershell
.\get_file.bat "backup/venv.zip" ".\downloads\venv.zip" "version_id_here"
```

Soft delete a logical path:

```powershell
.\delete_file.bat "backup/venv.zip"
```

Hard delete a logical path (removes manifest entry and unreferenced blob objects):

```powershell
.\delete_file.bat "backup/venv.zip" hard
```
