Metadata-Version: 2.4
Name: cozip
Version: 2026.5.10
Summary: Python bindings to libcozip, the reference writer for the Cloud-Optimized ZIP (cozip) format.
Project-URL: Homepage, https://asterisk.coop/taco/cozip
Project-URL: Repository, https://github.com/asterisk-labs/taco
Project-URL: Issues, https://github.com/asterisk-labs/taco/issues
Author-email: Cesar Aybar <cesar@asterisk.coop>, Julio Contreras <julio.contreras@uv.es>, Roy Yali <ryali93@gmail.com>
License: MIT License
        
        Copyright (c) 2026 Asterisk Labs
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
Keywords: archive,cloud-optimized,zip
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: C
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: System :: Archiving :: Compression
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: cffi>=1.16
Requires-Dist: pyarrow>=14
Provides-Extra: test
Requires-Dist: geopandas>=1.0; extra == 'test'
Requires-Dist: pytest>=8; extra == 'test'
Requires-Dist: shapely>=2.0; extra == 'test'
Description-Content-Type: text/markdown

<div align="center">
  <img src="images/banner.svg" alt="cozip — Cloud Optimized ZIP" width="700"/>

  <p>
    <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-EAB308?style=flat-square" alt="License MIT"/></a>
    <a href="https://pypi.org/project/cozip"><img src="https://img.shields.io/pypi/v/cozip?label=python&logo=python&logoColor=white&color=3776AB&style=flat-square" alt="PyPI"/></a>
    <a href="https://asterisk-labs.r-universe.dev/cozip"><img src="https://img.shields.io/badge/r--universe-cozip-276DC3?logo=r&logoColor=white&style=flat-square" alt="R"/></a>
    <a href="https://juliahub.com/ui/Packages/Cozip"><img src="https://img.shields.io/badge/julia-1.10%2B-9558B2?logo=julia&logoColor=white&style=flat-square" alt="Julia"/></a>
    <a href="https://www.npmjs.com/package/cozip"><img src="https://img.shields.io/npm/v/cozip?label=npm&logo=npm&logoColor=white&color=CB3837&style=flat-square" alt="npm"/></a>
    <a href="#wasm"><img src="https://img.shields.io/badge/wasm-browser--ready-654FF0?logo=webassembly&logoColor=white&style=flat-square" alt="WASM"/></a>
    <a href="https://duckdb.org/community_extensions"><img src="https://img.shields.io/badge/duckdb-extension-FFF000?logo=duckdb&logoColor=black&style=flat-square" alt="DuckDB"/></a>
    <a href="#core"><img src="https://img.shields.io/badge/C11-core-A8B9CC?logo=c&logoColor=white&style=flat-square" alt="C11"/></a>
  </p>
</div>

---

## What is cozip?

A ZIP file you can open like a table — over the network, without downloading it.

cozip puts a Parquet manifest called `__metadata__` at **byte 0** — one row per entry with name, offset, size, plus any columns you add (`split`, `label`, `class`...). DuckDB, Arrow, and Polars query it directly. Range requests fetch only the bytes you actually need.

A 20 GB archive becomes a queryable table.

**It's still a ZIP.** `unzip`, `zipfile.ZipFile`, your OS's preview window — all unchanged.

## Install

```bash
pip install cozip
```

## Usage

Two functions: `write` and `read`.

### Write

```python
import cozip
import polars as pl

df = pl.DataFrame({
    "path":  ["local/tile_001.tif", "local/tile_002.tif", "local/tile_003.tif"],
    "name":  ["tile_001.tif", "tile_002.tif", "tile_003.tif"],
    "split": ["train", "val", "train"],
    "label": ["cloud", "water", "forest"],
})

cozip.write("dataset.zip", df)
```

Two reserved columns. `path` is where the file lives on disk — it's consumed at write time and dropped. `name` is how the entry is stored inside the archive and becomes part of `__metadata__`. Every other column rides along and becomes queryable on read.

### Read

```python
df = cozip.read("dataset.zip")
```

Local file or remote URL — same call. You get a DataFrame back with one row per entry, including `offset` and `size` resolved against the archive.

```python
df = cozip.read("https://example.com/dataset.zip")

# query the manifest like any DataFrame
batch = df.filter(pl.col("split") == "train").sample(32)

# batch.select(["name", "offset", "size"]) is everything you need
# to range-request the payloads
```

## Bindings

| Language     | Install                                                                       |
|--------------|-------------------------------------------------------------------------------|
| Python       | `pip install cozip`                                                           |
| R            | `install.packages("cozip", repos = "https://asterisk-labs.r-universe.dev")`   |
| Julia        | `Pkg.add("Cozip")`                                                            |
| JavaScript   | `npm install cozip`                                                           |
| WASM         | browser bundle, no Node required                                              |
| DuckDB       | `INSTALL cozip FROM community; LOAD cozip;`                                   |
| C            | vendored single-header `cozip.h`                                              |

All bindings call into the same C11 core. Byte-exact behavior across runtimes.

## Specification

The on-disk format is defined in [SPEC.md](cozip/SPEC.md). Any conforming implementation reads any cozip ever written.

## License

MIT

<div align="center">
  <br>
  Developed with ❤️ by
  <br><br>
  <a href="https://asterisk.coop">
    <img src="images/asterisk_logo.svg" alt="Asterisk Labs" width="400"/>
  </a>
</div>