Metadata-Version: 2.4
Name: iceguard
Version: 1.0.0
Summary: Reliability library for Spark-on-AWS-Lambda lakehouse writes with timeout rollback, checkpoints, and orphan cleanup.
Project-URL: Homepage, https://github.com/vaquarkhan/IceGuard
Project-URL: Documentation, https://github.com/vaquarkhan/IceGuard/tree/main/docs
Project-URL: Repository, https://github.com/vaquarkhan/IceGuard
Project-URL: Issues, https://github.com/vaquarkhan/IceGuard/issues
Author: IceGuard Contributors
License-Expression: MIT
License-File: LICENSE
Keywords: aws,delta-lake,hudi,iceberg,lambda,opentelemetry,reliability,spark
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Distributed Computing
Requires-Python: <3.14,>=3.9
Requires-Dist: boto3>=1.28.0
Provides-Extra: all
Requires-Dist: delta-spark>=3.0.0; extra == 'all'
Requires-Dist: hypothesis>=6.82.0; extra == 'all'
Requires-Dist: opentelemetry-api>=1.20.0; extra == 'all'
Requires-Dist: opentelemetry-sdk>=1.20.0; extra == 'all'
Requires-Dist: pyiceberg>=0.6.0; extra == 'all'
Requires-Dist: pyspark>=3.4.0; extra == 'all'
Requires-Dist: pytest-cov>=4.1.0; extra == 'all'
Requires-Dist: pytest>=7.4.0; extra == 'all'
Provides-Extra: delta
Requires-Dist: delta-spark>=3.0.0; extra == 'delta'
Provides-Extra: dev
Requires-Dist: hypothesis>=6.82.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest>=7.4.0; extra == 'dev'
Provides-Extra: hudi
Requires-Dist: fastavro>=1.9.0; extra == 'hudi'
Requires-Dist: pyspark>=3.4.0; extra == 'hudi'
Provides-Extra: iceberg
Requires-Dist: pyiceberg>=0.6.0; extra == 'iceberg'
Provides-Extra: otel
Requires-Dist: opentelemetry-api>=1.20.0; extra == 'otel'
Requires-Dist: opentelemetry-sdk>=1.20.0; extra == 'otel'
Provides-Extra: spark
Requires-Dist: pyspark>=3.4.0; extra == 'spark'
Description-Content-Type: text/markdown

![IceGuard](images/snowGuard.jpeg)

# IceGuard

[![PyPI](https://img.shields.io/pypi/v/iceguard)](https://pypi.org/project/iceguard/)
[![PyPI downloads (month)](https://img.shields.io/pypi/dm/iceguard)](https://pypi.org/project/iceguard/)
[![PyPI downloads (week)](https://img.shields.io/pypi/dw/iceguard)](https://pypi.org/project/iceguard/)
[![Python](https://img.shields.io/pypi/pyversions/iceguard)](https://pypi.org/project/iceguard/)
[![License](https://img.shields.io/pypi/l/iceguard)](https://pypi.org/project/iceguard/)

**Published on PyPI:** [pypi.org/project/iceguard](https://pypi.org/project/iceguard/) · **Download stats:** [pepy.tech/project/iceguard](https://pepy.tech/project/iceguard) (aggregated; PyPI does not expose per-version counts in the API)

**Reliability library for Spark-on-AWS-Lambda (SoAL) lakehouse writes.** Chunked writes with timeout rollback, S3 checkpoints, orphan cleanup, and optional CloudWatch or OpenTelemetry metrics.

| Capability | Out of the box | You provide |
|------------|----------------|-------------|
| Timeout rollback between chunks | Yes | — |
| Checkpoint resume (S3) | Yes | S3 bucket |
| S3 path cleanup on rollback | Yes (`track_paths`) | — |
| Iceberg / Delta / **Hudi** adapters | Yes (S3 fallback) | Catalog / commit client for metadata |
| Orphan scan CLI | Yes | IAM on table path |
| Blocking `df.write.save()` in `protect()` only | **No** | Use `write_dataframe` |

## Install

```bash
pip install iceguard

# With optional extras
pip install "iceguard[spark,iceberg,hudi,otel]==1.0.0"

# From source (specific tag)
pip install "git+https://github.com/vaquarkhan/IceGuard.git@v1.0.0"
```

Extras: `[spark]`, `[iceberg]`, `[hudi]`, `[otel]`, `[dev]`

## Quick start

```python
import iceguard

with iceguard.protect(context, s3_bucket="my-checkpoints") as writer:
    writer.write(
        path="s3://lake/db/table",
        total_records=10_000,
        batch_writer=lambda s, e: write_chunk(s, e),
        track_paths=lambda s, e: new_paths(s, e),
    )
```

Spark: `iceguard.write_dataframe(writer, df, path, write_format="iceberg")`

CLI: `iceguard orphans scan s3://lake/db/table --json`

## Repository layout

| Path | Purpose |
|------|---------|
| [docs/](docs/) | Full documentation |
| [examples/](examples/) | Python, SAM, CDK samples |
| [terraform/](terraform/) | Modular production IaC |
| [infra/cloudwatch/](infra/cloudwatch/) | Dashboard JSON |
| [src/iceguard/](src/iceguard/) | Library source |

## Development

```bash
pip install -e ".[dev]"
pytest tests --cov=iceguard
python validation/run_all.py
```

## Release status

See [docs/STATUS.md](docs/STATUS.md) for capability checklist. **v1.0.0** is published via GitHub Actions trusted publishing (OIDC) on release tags.

## Documentation

- [API reference (complete)](docs/API.md)
- [Installation](docs/installation.md)
- [Architecture](docs/architecture.md)
- [Terraform](docs/terraform.md)
- [Formal verification](docs/formal-verification.md)
- [Publishing](docs/publishing.md)

## License

MIT — see [LICENSE](LICENSE).
