Metadata-Version: 2.4
Name: compressbench
Version: 0.1.0
Summary: CLI tool to benchmark compression algorithms on Parquet datasets
Author: Konstantinas Mamonas
License: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: typer
Requires-Dist: pyarrow
Requires-Dist: duckdb
Requires-Dist: python-snappy
Requires-Dist: lz4
Requires-Dist: zstandard
Requires-Dist: seaborn
Provides-Extra: gzip
Provides-Extra: snappy
Requires-Dist: python-snappy; extra == "snappy"
Provides-Extra: lz4
Requires-Dist: lz4; extra == "lz4"
Provides-Extra: zstd
Requires-Dist: zstandard; extra == "zstd"
Provides-Extra: all
Requires-Dist: python-snappy; extra == "all"
Requires-Dist: lz4; extra == "all"
Requires-Dist: zstandard; extra == "all"
Provides-Extra: viz
Requires-Dist: matplotlib; extra == "viz"
Provides-Extra: dev
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Dynamic: license-file

# compressbench

**Benchmark compression algorithms on Parquet datasets.**

## Why

Compression settings affect performance, storage cost, and latency — but most data engineers inherit defaults without testing.  
`compressbench` lets you benchmark compression ratio, compression speed, and decompression speed across algorithms using your own Parquet files.

## Features

- Accepts local Parquet files as input.
- Supports gzip and snappy.
- Outputs:
    - Compression ratio.
    - Compression time.
    - Decompression time.
- CLI built with Typer.
- Unit tests with pytest.

## Installation

```bash
pip install compressbench
```

## Usage
compressbench input.parquet --algorithms gzip snappy
If --algorithms is omitted, runs benchmarks for all available algorithms.

Example Output
Algorithm: gzip
Compression ratio: 2.91
Compression time: 0.43s
Decompression time: 0.12s

Algorithm: snappy
Compression ratio: 1.67
Compression time: 0.12s
Decompression time: 0.05s

## CLI Options
input.parquet   Path to the Parquet file to benchmark.
--algorithms    List of algorithms to test (gzip, snappy).
--level         Not supported in v0.1.0. Reserved for future versions.

## Roadmap
See ROADMAP.md for planned features.

## License
MIT
