Usage Guide
Installation
From PyPI (once published):
pip install pqfilt
From source:
git clone https://github.com/ysBach/pqfilt.git
cd pqfilt
pip install -e .
Python API
Basic Filtering
The main entry point is pqfilt.read():
import pqfilt
# Simple comparison
df = pqfilt.read("data.parquet", filters="vmag < 20")
# Equality
df = pqfilt.read("data.parquet", filters="flag == 1")
Expression Syntax
Expressions support & (AND), | (OR), and parentheses for grouping.
& binds tighter than | (standard boolean precedence):
# AND: both conditions must hold
df = pqfilt.read("data.parquet", filters="a > 5 & b < 10")
# OR: either condition holds
df = pqfilt.read("data.parquet", filters="a < 3 | a > 8")
# Mixed with parentheses
df = pqfilt.read("data.parquet", filters="(a < 3 & b > 50) | c == 1")
Membership Filters
Use in and not in with comma-separated values:
df = pqfilt.read("data.parquet", filters="desig in 1,2,3")
df = pqfilt.read("data.parquet", filters="name not in foo,bar")
Tuple Syntax
For programmatic use, pass filters as a list of 3-tuples (flat AND):
df = pqfilt.read("data.parquet", filters=[("a", ">", 5), ("b", "<", 10)])
Or as a list of lists for DNF (OR of AND-groups):
df = pqfilt.read("data.parquet", filters=[
[("a", "<", 3)],
[("a", ">", 8)],
])
Column Selection
Use columns for projection pushdown (only listed columns are read):
df = pqfilt.read("data.parquet", filters="a > 5", columns=["a", "b"])
Special Column Names
Columns with spaces, hyphens, or operator characters can be backtick-quoted:
df = pqfilt.read("data.parquet", filters="`alpha*360` > 100")
df = pqfilt.read("data.parquet", filters="`my column` <= 50")
Multi-file and Glob
Pass a glob pattern or a list of files:
df = pqfilt.read("data/*.parquet", filters="vmag < 20")
df = pqfilt.read(["file1.parquet", "file2.parquet"], filters="a > 5")
Output
Save filtered results directly:
df = pqfilt.read("data.parquet", filters="a > 5", output="out.parquet")
df = pqfilt.read("data.parquet", filters="a > 5", output="out.csv")
CLI Usage
Basic usage:
pqfilt data/*.parquet -f "vmag < 20" -o filtered.parquet
Multiple -f flags are AND-ed together:
pqfilt data/*.parquet -f "vmag < 20" -f "dec > 30" -o filtered.parquet
Boolean expressions within a single -f:
pqfilt data/*.parquet -f "(a < 30 & b > 50) | c == 1" -o out.parquet
Column selection:
pqfilt data/*.parquet -f "vmag < 20" --columns vmag,ra,dec -o out.parquet
Overwrite existing output:
pqfilt data/*.parquet -f "vmag < 20" -o out.parquet --overwrite