pqfilt

Generic Parquet predicate-pushdown filter tool – CLI and Python API.

pqfilt wraps pyarrow.dataset to let you filter Parquet files before they are fully read into memory, using row-group-level predicate pushdown.

Quick Start

Python API:

import pqfilt

# Simple filter
df = pqfilt.read("data.parquet", filters="vmag < 20")

# AND + OR with expression syntax
df = pqfilt.read("data.parquet", filters="(a < 30 & b > 50) | c == 1")

# Tuple syntax (flat AND)
df = pqfilt.read("data.parquet", filters=[("a", "<", 30), ("b", ">", 50)])

CLI:

pqfilt data/*.parquet -f "vmag < 20" -o filtered.parquet
pqfilt data/*.parquet -f "(a < 30 & b > 50) | c == 1" -o out.parquet