Metadata-Version: 2.4
Name: purejq
Version: 0.2.0
Summary: A pure Python implementation of jq
Author: adam2go
License: MIT
Project-URL: Homepage, https://github.com/adam2go/purejq
Keywords: jq,json,query,pure-python
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Text Processing :: Filters
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: speed
Requires-Dist: orjson; extra == "speed"
Dynamic: license-file

# purejq

[![CI](https://github.com/adam2go/purejq/actions/workflows/ci.yml/badge.svg)](https://github.com/adam2go/purejq/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/purejq)](https://pypi.org/project/purejq/)
[![Python](https://img.shields.io/badge/python-3.9%E2%80%933.14%20%7C%20PyPy-blue)](.github/workflows/ci.yml)
[![Conformance](https://img.shields.io/badge/jq%20test%20suite-96.2%25-brightgreen)](tests/conformance/expected_failures.txt)
[![License: MIT](https://img.shields.io/badge/license-MIT-green)](LICENSE)

**[jq](https://jqlang.github.io/jq/), as a pure Python library.** No C extension, no
binary: if Python runs, purejq runs — Pyodide/WASM, sandboxes, Lambda,
anywhere `pip install` is all you get.

```sh
pip install purejq
```

```python
import purejq

purejq.first(".users[] | select(.age > 26) | .name", data)   # work on your dicts directly
prog = purejq.compile("group_by(.team) | map(length)")        # compile once, run many
prog.first(batch)
```

```sh
echo '{"a":[1,2,3]}' | purejq '.a | map(. * 2)'               # familiar CLI, same flags
```

## Why purejq

- **Embedding jq in Python? purejq is 6–40x faster than the C bindings.**
  The [`jq` PyPI package](https://pypi.org/project/jq/) serializes your data
  to JSON text and back on every call; purejq evaluates directly on Python
  objects.
- **On big files, the CLI beats the C jq binary end-to-end.** Large-file runs
  are dominated by JSON parsing, and CPython's C-backed parser is faster than
  jq's.
- **It's real jq**: 751/781 cases (96.2%) of jq's own test suite pass —
  the suite is vendored in this repo and run in CI on every commit.

Where C jq still wins: raw filter throughput on already-parsed streams in
shell pipelines. If you can install binaries and that's your workload, use jq.

## Benchmarks

Measured with [tools/bench.py](tools/bench.py) (M-series MacBook, CPython
3.13, jq 1.8.1, best of 3). Reproduce: `python3 tools/bench.py 1000000`.

**Embedded in Python** — 100k-object array, already parsed, in-process:

| workload | purejq | `jq` PyPI (C bindings) |
|---|---:|---:|
| field-access stream | 9 ms | 410 ms |
| filter + count | 56 ms | 485 ms |
| map + aggregate | 18 ms | 483 ms |
| group_by | 114 ms | 765 ms |
| transform + sort | 141 ms | 943 ms |
| regex filter | 130 ms | 789 ms |

**Command line, end to end** — 93 MB file (1M objects), parse + filter + output:

| workload | purejq | jq 1.8 (C binary) |
|---|---:|---:|
| single lookup | 0.5 s | 1.6 s |
| filter + count | 1.1 s | 2.0 s |
| group_by | 2.3 s | 4.0 s |

*purejq CLI measured with the optional [orjson](https://github.com/ijl/orjson)
extra (`pip install 'purejq[speed]'`); with stdlib json alone it is ~25–35%
slower and still ahead on these workloads.*

**Loading large JSON into Python**: the 93 MB file parses in 0.73 s with
stdlib json (128 MB/s) or 0.43 s with orjson (219 MB/s) — input loading is
C-speed either way and scales linearly.

**PyPy** (100k objects, same code, no changes): filter + count 13 ms,
map + aggregate 2 ms, group_by 33 ms, transform + sort 70 ms — roughly
another 2–9x over CPython for heavy workloads.

How it's fast, in one line: programs compile once into Python closures with
static binding and single-output fast paths — evaluation never re-walks the
AST, and common shapes skip generator machinery entirely.

## jq compatibility

751/781 of jq's official test suite. Every remaining difference is listed in
[expected_failures.txt](tests/conformance/expected_failures.txt); they fall
into three buckets:

- the **module system** (`import`/`include`) is not implemented yet
- **integers are exact** (arbitrary precision, like gojq) instead of rounding
  to doubles — deliberate
- a few **error-message wordings** differ

Everything else is there: paths and all assignment operators,
`reduce`/`foreach`, `try`/`catch`, `label`/`break`, `?//` destructuring,
string interpolation, `@formats`, regex builtins, streaming
(`tostream`/`fromstream`), dates, and jq 1.8 additions.

CLI flags: `-n -r -j -c -s -e -f --arg --argjson`. Outputs are lazy
iterators — `purejq.compile("repeat(. * 2)").run(1)` happily yields forever.

## Compatibility

CPython 3.9–3.14 and PyPy, zero runtime dependencies, enforced by
[CI](.github/workflows/ci.yml) on every push.

## Contributing & internals

See [CONTRIBUTING.md](CONTRIBUTING.md) — the conformance suite is the
scoreboard, `tools/bench.py` is the speedometer.

## License

[MIT](LICENSE)
