Metadata-Version: 2.4
Name: pyacid
Version: 0.5.0a1
Summary: Astronomical Catalog Inference Driver: XMATCH SQL over HATS-partitioned Parquet via native Polars
Author-email: Mario Juric <mjuric@uw.edu>
License-Expression: BSD-3-Clause
Project-URL: Homepage, https://acid.juriclab.org/
Project-URL: Issues, https://github.com/mjuric/acid-issues/issues
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Astronomy
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: sqlglot<31,>=27
Requires-Dist: pyarrow>=14
Requires-Dist: numpy>=1.24
Requires-Dist: scipy>=1.10
Requires-Dist: pyyaml>=6
Requires-Dist: rich>=13
Requires-Dist: polars>=1.41.0
Requires-Dist: cdshealpix>=0.7
Requires-Dist: fsspec[http]>=2023.1
Requires-Dist: astropy>=5
Requires-Dist: cloudpickle>=3
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == "dev"
Dynamic: license-file

# ACID — Astronomical Catalog Inference Driver

Copyright (c) 2026, Mario Juric. BSD 3-Clause License.

Cross-match and query HEALPix-partitioned astronomical catalogs from
Python. ACID gives you a fluent `Catalog` surface (`crossmatch`,
`where`, `select`, `group_by` / `aggregate`, `save`) for the common
shapes, and a SQL escape hatch (`db.sql(...)`) with one astronomy
extension — `XMATCH(radius_arcsec => ...)` — for everything else. Each
anchor partition runs independently against a boundary-safe margin
cache.

Reads and writes the [HATS](https://hats.readthedocs.io) format used by
LINCC Frameworks (LSDB, hats-import) and by published catalogs such as
Gaia DR3 and Rubin DP1.

---

## Quick start

### Run a query

`acid` is module-level and singleton-by-default (like Ray / DuckDB /
Polars): `acid.open(...)` returns a catalog handle, `acid.sql.query(...)` is the
escape hatch, and the worker pool is built once and reused. `acid.init(...)`
is optional — call it to pin the source / worker count; otherwise the first
`acid.open()` lazy-inits with defaults.

```python
import acid
import astropy.units as u

acid.init("catalogs.yaml", workers=8)       # optional — pins config

gaia = acid.open("gaia_dr3")
twomass = acid.open("twomass_psc")

# Fluent: composable verbs, lazy until materialized.
matches = (gaia
           .crossmatch(twomass, radius=1*u.arcsec)
           .where("phot_g_mean_mag < 16")
           .select("source_id, designation"))

matches.head(10).show()         # pretty-print to stdout
df = matches.to_polars()        # Catalog converters: .to_polars(),
                                #   .to_astropy(), .to_arrow(), .to_pandas()

# SQL escape hatch for aggregates / HAVING / windows / DISTINCT.
r = acid.sql.query("""
    SELECT g.source_id, t.designation, d
    FROM   gaia_dr3 AS g
    JOIN   twomass_psc AS t ON XMATCH(radius_arcsec => 1.0, dist_col => 'd')
    ORDER BY d
    LIMIT  20
""")
print(r)
```

Need two simultaneous connections, or full isolation (e.g. in a library or
a test)? Construct `acid.Connection(...)` explicitly and use it as a context
manager — it bypasses the module-level default entirely:

```python
with acid.Connection("catalogs.yaml", workers=8) as db:
    df = db.open("gaia_dr3").head(10).to_polars()
```

`Result` is a thin wrapper around an Arrow table that comes back from
every materialization call. `.show()` prints; `.to_astropy()` /
`.to_polars()` / `.to_arrow()` / `.to_pandas()` / `.to_pylist()`
convert (the same converter names as `Catalog`); and
`.export("results.parquet")` writes one flat file (csv/parquet/fits,
by extension or `format=`) while `.save(path)` writes a HATS catalog
directory — the same stays-in-the-system / leaves-the-system pair as
`Catalog.save` / `Catalog.export` (minus the name registration, which
only makes sense on a connection).

### Restrict to a region while you iterate

`acid.in_cone(...)` (or `db.in_cone(...)`) is a context manager that scopes a spatial cone to
every query **executed** inside the `with` block — both the fluent surface
and `db.sql(...)`. The cone is applied at execution time, not when the
query was built, so you can build a query once and run it scoped inside the
block and full-sky outside it. Use it for a "debug small, run big"
workflow: keep the block in while you iterate, remove it for the production
run.

```python
gaia = acid.open("gaia_dr3")

with acid.in_cone((180, 0), radius=1*u.deg):
    small = gaia.where("phot_g_mean_mag < 16").to_polars()

# Same query, full sky, no edits:
big = gaia.where("phot_g_mean_mag < 16").to_polars()
```

Cones do not nest; one `in_cone` block at a time.

### Materialize an intermediate

`Catalog.save(...)` writes a query result as a HATS catalog *and*
registers it on the connection so later queries can reference it by
name. This is the canonical EDA pattern: run a heavy crossmatch once,
save it, iterate cheaply on the cached output.

```python
nearby = (acid.open("gaia_dr3")
              .crossmatch(acid.open("twomass_psc"), radius=1*u.arcsec)
              .save("./out/gaia_x_2mass", name="nearby"))

# `nearby` is a normal Catalog; "nearby" is also resolvable by name.
r = acid.sql.query("SELECT COUNT(*) FROM nearby")
print(r)
```

### CLI

```bash
# Query execution. The SQL query is required (use '-' to read stdin, or -f).
# --db is a ':'-separated list of HATS dirs / registry YAMLs; it's optional,
# falling back to $ACID_PATH, the acid.conf 'path' setting, then ~/datasets.
acid query "SELECT COUNT(*) FROM object" --db datasets/ --output /tmp/result
acid query -f query.sql --db catalogs.yaml --output results/ --workers 32
# --ram-budget bounds the RAM the planner sizes work tuples for
# (default: 25% of available RAM); bytes or 64GB / 512MiB forms.
acid query "SELECT ..." --db datasets/ --ram-budget 64GB
echo "SELECT ..." | acid query - --db datasets/ --output results/   # '-' reads stdin
# --format is optional: it's inferred from the --output extension
# (.parquet/.pq, .csv, .fits/.fit, .hats); no extension → HATS tree.
acid query "SELECT ..." --db datasets/ --output results.csv
# --open uses a raw file (parquet/csv/fits/arrow/…) as a table, alongside the
# --db catalogs. The ra/dec column names are required. Two forms: positional
# 'PATH,RA,DEC' (table name = file basename) or named 'NAME=PATH,ra=RA,dec=DEC'.
acid query "SELECT t.id, g.source_id FROM t JOIN gaia ON XMATCH(radius_arcsec => 1.0)" \
    --db datasets/ --open t=candidates.csv,ra=RA,dec=DEC
acid query "SELECT * FROM candidates" --db datasets/ --open candidates.csv,RA,DEC
acid validate "SELECT ..." --db datasets/

# List what's already in your catalog path (ACID_PATH) — names you can query
acid list                                             # every catalog acid can open by name
acid list gaia                                        # filter by name (substring)
# → same one-line-per-catalog format as `acid search` (margin radii, shadowing),
#   but over ACID_PATH (what you already have) rather than the download path.

# Discover what's available to download (across the download path)
acid search                                           # list every downloadable catalog
acid search gaia                                      # filter by name (substring)
# → one line per catalog with its margin-cache radii (arcsec); names like
#   `wise/allwise` come from namespace dirs on the mirror. Remote listings are
#   cached ~1h; --cache refresh re-crawls. Piped, it emits TSV for scripting.

# Download catalogs (HTTP, SSH, or local; full or spatial subset)
acid download two_mass                                # resolve name + dest (see below)
acid download wise/allwise                            # nested name (as shown by `acid search`)
acid download https://data.lsdb.io/hats/two_mass/two_mass /data/two_mass
acid download https://data.lsdb.io/hats/two_mass/two_mass /data/two_mass --cone 50,-50,10
acid download user@server:/hats/gaia /data/gaia --columns ra,dec,mag --cone 180,0,5

# Inspect catalogs (local or remote)
acid inspect two_mass                                # bare name → resolved on ACID_PATH
acid inspect /data/two_mass                          # summary
acid inspect schema /data/two_mass                   # column schema
acid inspect https://data.lsdb.io/hats/two_mass/two_mass  # remote

# Build margin caches locally (--margin-arcsec defaults to 10.0)
acid hats build-margin /data/two_mass --margin-arcsec 10.0 --workers 16
```

`results/` is itself a valid HATS catalog (`lsdb.open_catalog(...)` and
`hats.read_hats(...)` will read it). Downloaded subsets are also valid
HATS catalogs with rebuilt `_metadata`.

**`acid download` name resolution.** Give a bare catalog name and both the
source and destination are resolved for you:

```bash
acid download two_mass
# source ← first ACID_DOWNLOAD_PATH root that has it (collection-aware),
#          e.g. https://data.lsdb.io/hats/two_mass/two_mass
# dest   ← <first writable local ACID_PATH root>/two_mass (created if needed)
```

The source search path is `ACID_DOWNLOAD_PATH` → the `download_path` config
setting → the built-in defaults (`https://data.lsdb.io/hats/`, then the SLAC
`ssh://slacd/sdf/home/m/mjuric/datasets` dir), searched in order. Each candidate
`<root>/<name>` is probed; a directory holding `collection.properties` is a
*collection*, so its `hats_primary_table_url` child is downloaded. The name may
be **nested** (`wise/allwise`) to reach a catalog under a namespace directory —
exactly the names `acid search` prints — and lands locally under its leaf
(`<ACID_PATH>/allwise`). An explicit source (a leading `./` / `/` / `~` path, or
a URL) skips the lookup and is used verbatim; give a local relative dir a leading
`./` to copy from it. Use `acid search` to see which names resolve.

The destination follows the **same bare-vs-path rule**: omit it and the
catalog lands in `<first writable local ACID_PATH root>/<catalog name>`; pass a
bare name (`acid download two_mass tm`) and it resolves to `<ACID_PATH root>/tm`;
pass a path with a `/` (`./tm`, `/data/tm`) and it's used verbatim. The
`ACID_PATH` root is the same search path `acid query` uses (URL entries are
skipped), created with a notice if it doesn't exist. An explicit source with an
omitted destination is an error (there's no name to resolve a destination from).

### Catalog registry

The simplest way: point `--db` (or `acid.init(...)`) at a directory
of HATS catalogs. Each subdirectory with a `properties` file becomes a
table named after the directory. Margin caches
(`dataproduct_type=margin`) are auto-skipped.

For more control, use a YAML file:

```yaml
catalogs:
  dia_source:
    path: /data/dia_source      # HATS root, or CatalogCollection root
    # Auto-detected from <path>/properties when present:
    #   ra_col            (from hats_col_ra)
    #   dec_col           (from hats_col_dec)
    #   hpix_order        (from <path>/partition_info.csv)
    #   neighbor_path     (from collection.properties or sibling '_margin' dir)
    #   neighbor_margin_arcsec  (from hats_margin_threshold)
    #   npix_suffix       (from hats_npix_suffix; default '.parquet')
    # Any auto-detected value can be overridden here.

  object:
    path: /data/object_collection    # a CatalogCollection root works too

  lightcurve:
    path: /data/lightcurve
    hpix_order: 5                    # explicit when partition_info.csv is absent

# Named MOC footprints for IN_MOC() filtering.
# Each entry is a path to a FITS file (HEALPix image or MOC FITS).
mocs:
  des_dr2: /data/mocs/des_dr2.fits
  known_artifacts: /data/mocs/artifacts.fits
  # If a catalog has a point_map.fits at its root, IN_MOC(<alias>, '<catalog_name>')
  # auto-loads it — no explicit entry needed.
```

### Configuration (`acid.conf`)

So you don't re-type `--db`/`--workers` on every invocation, `acid` reads
an INI config. The first existing file wins, searched highest-priority
first: `--config FILE` / `$ACID_CONFIG`, then
`~/.config/acid/acid.conf` (`$XDG_CONFIG_HOME`),
`/sdf/data/rubin/user/mjuric/etc/acid.conf`,
`/sdf/home/m/mjuric/etc/acid.conf`, `$XDG_CONFIG_DIRS`, `/etc/acid/acid.conf`.

```ini
# ~/.config/acid/acid.conf
[acid]
path = /data/hats:~/datasets        # ':'-separated HATS dirs / registry YAMLs
download_path = https://data.lsdb.io/hats/   # 'acid download' name search path
workers = 32                        # query worker pool ("auto" = cgroup-aware)
mem_per_worker_gb = 4               # RAM/worker bounding "auto" (CPU and memory)
tmpdir = /scratch/$USER             # base temp dir (a per-run subdir is made + cleaned)
inmem_row_limit = 50_000_000        # spill threshold
```

Each setting resolves **explicit flag/arg → env var → config → built-in**.
Env overrides: `ACID_PATH`, `ACID_DOWNLOAD_PATH`, `ACID_WORKERS`,
`ACID_MEM_PER_WORKER_GB`, `ACID_TMPDIR`, `ACID_INMEM_ROW_LIMIT`. Inspect and
edit with `acid config`:

```bash
acid config show                 # values set in the file (--effective: resolved + provenance)
acid config get workers          # file value; exits 1 (prints nothing) if unset
acid config set path /data/hats:~/datasets
```

With a config in place, `--db` becomes optional (falls back to the `path`
setting, then `~/datasets`). See `docs/archive/CONFIG-SYSTEM.md` for the full design.

---

## What `XMATCH` does

```sql
JOIN  b ON XMATCH(radius_arcsec => 1.0)                   -- nearest, inner
JOIN  b ON XMATCH(r => 1.0)                               -- 'r' is an alias
JOIN  b ON XMATCH(r => 1.0, mode => 'all')                -- every match within r
LEFT JOIN b ON XMATCH(r => 1.0)                           -- keep unmatched anchors

-- Distance is exposed as a named column via dist_col on the XMATCH call.
SELECT a.id, d FROM a JOIN b ON XMATCH(r => 1.0, dist_col => 'd')
WHERE  d < 0.5

-- Ordinary joins, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT/OFFSET,
-- DISTINCT all work; cross-partition reduction is handled internally.
SELECT a.id, COUNT(*) AS n, AVG(d) AS avg_d
FROM a
JOIN  b ON XMATCH(r => 1.0, dist_col => 'd')
JOIN  lightcurve AS lc ON a.id = lc.object_id
GROUP BY a.id
ORDER BY n DESC LIMIT 100

-- Footprint filtering via MOC (Multi-Order Coverage maps):
-- Restrict rows to a survey footprint or sky region.
SELECT a.id, a.ra, a.dec
FROM a JOIN b ON XMATCH(r => 1.0)
WHERE IN_MOC(a, 'des_dr2')              -- anchor inside DES footprint
  AND NOT IN_MOC(b, 'known_artifacts')  -- exclude artifact regions

-- IN_MOC also works in SELECT projections (per-row boolean):
SELECT a.id, IN_MOC(a, 'des_dr2') AS in_des FROM a
```

The fluent equivalent of the simple shapes:

```python
a.crossmatch(b, radius=1*u.arcsec)                          # nearest, inner
a.crossmatch(b, radius=1*u.arcsec, how="all")               # every match within r
a.crossmatch(b, radius=1*u.arcsec, how="left")              # LEFT XMATCH
a.in_region("des_dr2")                                      # IN_MOC mask, per-receiver
```

Semantics, in short:

- All XMATCHes in a query use the **anchor** (first FROM) table's
  coordinates, even after a `mode => 'all'` expansion.
- A right-table radius **must be ≤** that catalog's
  `neighbor_margin_arcsec`. Otherwise we'd silently miss boundary
  pairs; the analyzer rejects the query.
- `ORDER BY ... LIMIT K` pushes the top-K to each partition first;
  the reducer re-sorts the union and applies the global LIMIT/OFFSET.
- Aggregates / GROUP BY / DISTINCT / HAVING run in a phase-2 reducer
  over the per-partition Parquet output.

---

## Python API surface

```python
# Module-level API (singleton-by-default). The first call lazy-inits;
# acid.init(...) pins config; acid.shutdown() tears down.
acid.init(source=None, *, workers=None, threads=None,
          inmem_row_limit=50_000_000, progress via configure) -> None
acid.shutdown() / acid.is_initialized() / acid.configure(progress=...)
acid.open(name_or_path, *, alias=None, columns=None) -> Catalog
acid.register_catalog(name, **spec_kwargs) / acid.register_file(name, src, *, ra=, dec=)
acid.list_catalogs() / acid.register_moc(...)
acid.in_cone(center, *, radius) / acid.status()

# SQL escape hatch — the acid.sql submodule
acid.sql.query(query, *, output=None)                -> Result
acid.sql.validate(query) / acid.sql.explain(query)

# Explicit, isolated Connection (escape hatch — two connections / two configs)
db = acid.Connection(source, *, workers=None, threads=None,
                     inmem_row_limit=50_000_000, progress="auto")
# ...then db.open(...) / db.sql(...) / etc. — the same methods, on `db`.
db.in_cone(center, *, radius)                       # ctx manager
db.status() / db.validate(q) / db.explain(q)
db.close()    # or use as a context manager

# Catalog (composable, lazy) — composition verbs return Catalog;
# materialization verbs execute and return Result (or, for to_*, the
# converted type).
cat.where(predicate)        -> Catalog
cat.select(*cols)           -> Catalog
cat.limit(n)                -> Catalog
cat.in_region(moc_or_cat)   -> Catalog
cat.crossmatch(other, *, radius, how="nearest"|"all"|"left",
               dist_col=None, suffix=None)                   -> Catalog
cat.join(other, *, on, how="inner"|"left")                   -> Catalog
# Fluent aggregation — decomposable-only (acid.agg constructors).
cat.group_by(*keys)                        -> Catalog
cat.aggregate(**named_aggs)                -> Catalog
# After .aggregate(), verbs compose over the aggregate output in written
# order: a post-aggregate .where(...) is the old HAVING (and .limit(5).where(...)
# filters the top-5 — fluent composes by position, no separate .having()).
cat.sort(*keys, descending=False, nulls_last=False) -> Catalog
# Reduction shortcuts — one aggregate, no agg.* ceremony. Global (no
# group_by) materializes and returns a bare Python scalar; grouped returns a
# lazy Catalog (column `count` / `mean_<col>` / …, so a following .where() is
# HAVING). For mixed stats / named outputs use .aggregate(...).
cat.count(col=None)                         -> int | Catalog
cat.sum/mean/min/max/std/var(col)           -> scalar | Catalog
# Decomposable aggregate constructors (acid.agg):
#   agg.count(col=None), agg.sum, agg.mean, agg.min, agg.max,
#   agg.std, agg.var, agg.all, agg.any, agg.list.
# (No agg.median / agg.mode — non-decomposable; rejected with
# ValidationError. Drop into Polars after .to_polars() if you need them.)

cat.columns / cat.alias / cat.describe() / cat.explain()
cat.head(n=10)              -> Result
cat.execute()               -> Result
# These convert and return the target type directly (no Result detour):
cat.to_pandas() / cat.to_astropy() / cat.to_polars() / cat.to_arrow()
cat.save(path, *, name=None, overwrite=False) -> Catalog

# Result — comes back from Catalog.head / .execute and from db.sql.
# A thin wrapper around an in-memory pa.Table or a partitioned dir;
# same converter / terminal names as Catalog.
r.num_rows, r.column_names, r.schema
r.column(name)         -> pa.ChunkedArray
r.show(n=20)           # pretty-print first n rows (CLI renderer)
print(r)               # renders the result as a Polars DataFrame (__str__)
r.to_arrow()           -> pa.Table
r.to_polars()          -> polars.DataFrame
r.to_astropy()         -> astropy.table.Table
r.to_pandas()          -> pandas.DataFrame
r.to_pylist()          -> list[dict]
r.batches(batch_size=None) -> Iterator[pa.RecordBatch]
r.head(n=10)           -> Result
r.export(path, format=None) -> Path  # one flat file; format from extension
                                     # (a Result has left the system — no .save();
                                     #  write HATS from a Catalog or Connection.sql(output=))
len(r), for batch in r: ...

# Errors (all inherit from acid.AcidError)
acid.RegistryError           # catalog config (missing path, mixed Norder, ...)
acid.ParseError              # SQL parse failures
acid.ValidationError         # unsupported XMATCH constructs
acid.ExecutionError          # per-partition execution failures
acid.ConnectionClosedError   # method called on a closed Connection
```

---

## Layout assumptions

- Catalogs follow the **HATS** layout:
  `<root>/dataset/Norder=N/Dir=D/Npix=P.parquet` (or
  `Npix=P/*.parquet` when `hats_npix_suffix='/'`).
- Margin caches live as **sibling catalogs** (HATS canonical), at
  `<root>/margin_cache/...`, or any sibling dir matching
  `<name>_margin*`. `collection.properties` is consumed if present.
- Adaptive (per-pixel) Norder is supported: a catalog's
  `partition_info.csv` may list pixels at any orders, and XMATCH/ordinary
  joins across mixed-Norder catalogs are run via a refinement-tree
  enumeration that emits one work unit per coarsest cursor pixel where
  every joined catalog has ≤ 1 partition. Output is itself a valid
  HATS catalog whose `partition_info.csv` reflects the refinement.

---

## What's the speed story?

- Each partition is independent → embarrassingly parallel across
  HEALPix pixels.
- Top-K queries push the LIMIT to each partition. Aggregates write
  partial data to disk and reduce centrally.
- Column pruning: the anchor and right relations are lazy Polars
  `LazyFrame`s over `scan_parquet()`, so the final projection only pulls
  referenced columns from disk. Wide catalogs (150+ columns) don't slow
  down narrow SELECTs.
- Auto-spill: when `output` is unset and the running result exceeds
  `inmem_row_limit` (default 50M rows), `acid` spills to a tempdir
  rather than OOM-ing the parent.
- Allocator tuning: `acid` ships a jemalloc default that avoids
  page-purge contention at high worker counts (~2× faster wall, ~20%
  more RSS). It's a single overridable env var — see
  [`MEMORY-TUNING.md`](MEMORY-TUNING.md) if you're memory-constrained or
  scaling `workers` on a large machine.

See `bench/match_all.py` and `bench/session_vs_oneshot.py` for
microbenchmarks.

---

## Install

### With uv (recommended for development)

```bash
uv sync --dev          # creates .venv, installs all deps + test + hats
uv run pytest          # run tests
```

### With pip

```bash
pip install -e .
# extras:  pip install -e .[hats,dev]
```

Requires Python 3.10+. Runtime dependencies (installed automatically):
Polars ≥ 1.41, SQLGlot ≥ 27 (< 31), PyArrow ≥ 14, NumPy ≥ 1.24,
SciPy ≥ 1.10, cdshealpix ≥ 0.7, fsspec ≥ 2023.1, Astropy ≥ 5,
PyYAML ≥ 6, rich ≥ 13. `mocpy` is **not** a runtime dependency —
ACID ships a dependency-light MOC implementation.

---

## Status

- **v0 (correctness):** XMATCH inner/left, mode 'nearest'/'all', chains,
  ordinary joins, distance via `XMATCH(..., dist_col => '<name>')`.
- **v1 (scale):** views + narrow side-tables, vectorized matcher,
  worker initializer, auto-spill, top-K pushdown, manifest.
- **v1.1 (HATS spec):** writes valid HATS catalogs, reads canonical
  property keys, supports `hats_npix_suffix='/'`, auto-discovers
  margin siblings via `collection.properties`.
- **v2 (EDA):** persistent `Connection`, per-worker Polars engine,
  `Result` wrapper, `Catalog.save()` for materialization.
- **v3 (adaptive Norder):** per-catalog `PartitionIndex`, refinement-tree
  tuple enumeration, integer `_healpix_29` range filtering for per-pixel
  row pruning, LEFT-XMATCH/JOIN over partitions without coverage.
- **v4 (Polars-native):** single native-Polars engine; DuckDB, the SQL
  rewriter/reducer, the engine abstraction, and the `QueryPlan` IR
  removed (see `CHANGELOG.md` / `ARCHITECTURE.md`).
- **v4 (MOC footprint filtering):** `IN_MOC(<alias>, '<name>')` in WHERE
  restricts rows to a named sky region (Multi-Order Coverage map). Supports
  `NOT IN_MOC`, multiple predicates (AND-combined via mocpy set ops), and
  catalog auto-resolution from `point_map.fits`. `IN_MOC` is a footprint
  restriction only — it must sit in conjunctive `WHERE` position (top-level
  AND-chain, optionally negated); use in `SELECT`/`ORDER BY`/`CASE`/`JOIN ON`
  or inside a disjunction is rejected (see Known limitations). Three-level
  optimization: catalog-footprint scoping, cursor-pixel intersection, and
  partition-level pruning — all via the existing `_healpix_29` row-group
  pushdown fast path.
- **v5 (catalog ops):** `acid hats build-margin` builds HATS margin caches
  locally (validated against hats-import). `acid download` generates
  `point_map.fits`, auto-includes HATS RA/Dec/healpix columns. `acid query`
  accepts `--db <directory>` for zero-config usage, fails fast on errors,
  shows tqdm progress, shuffles work for load balancing. Bare column
  resolution via schema introspection. `LocalFetcher` for local I/O.
- **v6 (fluent Catalog API):** `acid.init()` builds a process-wide default (or `acid.Connection()` an explicit)
  `Connection`; `db.open(name)` returns a lazy `Catalog`; verbs
  (`where`, `select`, `crossmatch`, `join`, `in_region`, `save`)
  compose without writing SQL. `db.in_cone(...)` scopes a cone to
  every query in a `with` block. `db.sql(...)` remains the escape
  hatch for decomposable aggregates, `HAVING`, and top-K
  (`ORDER BY ... LIMIT`). Window functions, `DISTINCT`,
  `COUNT(DISTINCT)`, bare `GROUP BY`, and unbounded `ORDER BY` are
  rejected with a `ValidationError`.

Tests: ~545 passing (~60s parallel via pytest-xdist) on the native
Polars engine. Fixtures cached across runs.

### Known limitations

- **XMATCH must be the entire `ON` predicate.** Compound predicates
  like `XMATCH(...) AND b.mag < 20` are rejected.
- **No CTEs / subqueries in the anchor position.**
- **RIGHT / FULL / CROSS JOIN XMATCH** not supported.
- **`IN_MOC` must be in conjunctive WHERE position** (top-level AND-chain,
  optionally negated). Disjunctive use (`IN_MOC(...) OR ...`) and
  `IN_MOC` in `JOIN ON` are rejected.
- **No nested `db.in_cone(...)` blocks.** The true intersection of
  two non-concentric cones is not a cone; we refuse rather than
  silently approximate.
