Metadata-Version: 2.4
Name: parquet-lf
Version: 1.0.0
Summary: A lingua franca utility for converting between data formats (JSON, CSV) and Parquet.
Project-URL: Homepage, https://github.com/mattjmcnaughton/parquet-lf
Project-URL: Repository, https://github.com/mattjmcnaughton/parquet-lf
Project-URL: Issues, https://github.com/mattjmcnaughton/parquet-lf/issues
Author: Matt McNaughton
License-Expression: MIT
License-File: LICENSE
Keywords: cli,conversion,csv,data,json,parquet
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: File Formats
Classifier: Topic :: Utilities
Requires-Python: >=3.13
Requires-Dist: polars>=1.36.1
Requires-Dist: structlog>=25.4.0
Requires-Dist: typer>=0.17.4
Description-Content-Type: text/markdown

# parquet-lf

A lingua franca utility for converting between data formats (NDJSON, CSV) and Parquet.

## Installation

```bash
uv tool install parquet-lf
```

## Usage

### Convert to Parquet

```bash
# Convert CSV to Parquet
parquet-lf to-parquet csv input.csv -o output.parquet

# Convert NDJSON to Parquet
parquet-lf to-parquet ndjson input.ndjson -o output.parquet

# jsonl is an alias for ndjson
parquet-lf to-parquet jsonl input.jsonl -o output.parquet
```

### Convert from Parquet

```bash
# Convert Parquet to CSV
parquet-lf from-parquet csv input.parquet -o output.csv

# Convert Parquet to NDJSON
parquet-lf from-parquet ndjson input.parquet -o output.ndjson

# jsonl is an alias for ndjson
parquet-lf from-parquet jsonl input.parquet -o output.jsonl
```

### Output to stdout

When the `-o/--output` flag is omitted, output is written to stdout:

```bash
# Output CSV to stdout
parquet-lf from-parquet csv input.parquet

# Pipe to another command
parquet-lf from-parquet csv input.parquet | head -10

# Output Parquet to stdout (binary) and redirect to file
parquet-lf to-parquet csv input.csv > output.parquet
```

Note: Logs are written to stderr, so they won't interfere with piped data.

### Inspect Files

Use the `info` command to view file metadata and schema without loading the entire dataset:

```bash
# Show file info (schema, row count, size)
parquet-lf info examples/sample.parquet

# Show file info with preview of first N rows
parquet-lf info --head 5 examples/sample.parquet
parquet-lf info -n 5 examples/sample.csv
```

The `info` command supports all formats (Parquet, CSV, NDJSON) and auto-detects the format from the file extension.

### Help

```bash
parquet-lf --help
parquet-lf to-parquet --help
parquet-lf from-parquet --help
parquet-lf info --help
```

## Supported Formats

### NDJSON (Newline Delimited JSON)

NDJSON is a format where each line is a valid JSON object. It's a true tabular peer to CSV, making it ideal for data interchange.

Example NDJSON file:
```json
{"name": "alice", "value": 10}
{"name": "bob", "value": 20}
{"name": "charlie", "value": 30}
```

Both `ndjson` and `jsonl` commands are supported as synonyms.

### CSV

Standard comma-separated values format with a header row.

Example CSV file:
```csv
name,value
alice,10
bob,20
charlie,30
```

## Example Files

The `examples/` directory contains sample data files for experimenting with the CLI:

- `examples/sample.parquet` - Parquet format
- `examples/sample.csv` - CSV format
- `examples/sample.ndjson` - NDJSON format

These files contain the same 5-row dataset with columns: id, name, age, city, score.

## Development

See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup and guidelines.

## License

MIT License - see [LICENSE](LICENSE) for details.
