Metadata-Version: 2.4
Name: hipscatalog-gen
Version: 0.3.0
Summary: Python package for building HiPS-compliant catalog hierarchies from large astronomical tables using Dask and LSDB.
Author-email: Luigi Lucas de Carvalho Silva <luigi.silva@linea.org.br>
License-Expression: MIT
Project-URL: Homepage, https://github.com/linea-it/hipscatalog_gen
Project-URL: Documentation, https://github.com/linea-it/hipscatalog_gen
Project-URL: Bug Tracker, https://github.com/linea-it/hipscatalog_gen/issues
Project-URL: Source Code, https://github.com/linea-it/hipscatalog-gen
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy<3.0,>=2.0
Requires-Dist: pandas<4.0,>=2.1
Requires-Dist: pyarrow<24,>=14
Requires-Dist: dask<2027.0,>=2026.1.0
Requires-Dist: distributed<2027.0,>=2026.1.0
Requires-Dist: dask-jobqueue<1.0,>=0.8
Requires-Dist: astropy<8,>=6
Requires-Dist: healpy<2.0,>=1.17
Requires-Dist: mocpy<1.0,>=0.12
Requires-Dist: hats<1.0,>=0.6
Requires-Dist: lsdb<1.0,>=0.6
Requires-Dist: pyyaml<7,>=6
Requires-Dist: scipy>=1.13
Provides-Extra: slurm
Requires-Dist: dask-jobqueue<1.0,>=0.8; extra == "slurm"
Provides-Extra: s3
Requires-Dist: s3fs<2026.0,>=2024.6; extra == "s3"
Provides-Extra: dev
Requires-Dist: asv[virtualenv]==0.6.5; extra == "dev"
Requires-Dist: jupyter; extra == "dev"
Requires-Dist: nbconvert<8,>=7.16.6; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Requires-Dist: pre-commit; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx<10,>=7.2; extra == "docs"
Requires-Dist: sphinx-rtd-theme<4,>=3.0; extra == "docs"
Requires-Dist: myst-parser<6,>=2.0; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints<4,>=2.0; extra == "docs"
Dynamic: license-file

# hipscatalog-gen

[![Template](https://img.shields.io/badge/Template-LINCC%20Frameworks%20Python%20Project%20Template-brightgreen)](https://lincc-ppt.readthedocs.io/en/latest/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Python Versions](https://img.shields.io/badge/python-3.11+-blue.svg)]()
[![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/linea-it/hipscatalog_gen/smoke-test.yml)](https://github.com/linea-it/hipscatalog_gen/actions/workflows/smoke-test.yml)
[![Codecov](https://codecov.io/gh/linea-it/hipscatalog_gen/branch/main/graph/badge.svg)](https://codecov.io/gh/linea-it/hipscatalog_gen)

This project was created following the LINCC Frameworks Python Project Template (https://lincc-ppt.readthedocs.io/en/latest/).

-------------------------------------------------------------------------------

## Overview

hipscatalog-gen is a Python package for building HiPS-compliant catalog hierarchies from large astronomical tables using Dask and LSDB. It is inspired by and extends the logic of the CDS *Hipsgen-cat.jar* tool, providing a scalable and parallelized Python implementation suitable for large-scale workflows. Documentation: https://linea-it.github.io/hipscatalog_gen/


The pipeline supports three selection modes, configured in the YAML file under algorithm.selection_mode:

- **mag_global**   — global magnitude-complete selection.
- **score_global** — global selection driven by an arbitrary score/expression.
- **score_density_hybrid** — density-driven depths 1..`density_up_to_depth` (default 4) with score-based distribution afterwards.

-------------------------------------------------------------------------------

## Quick Start (PyPI)

Install from PyPI into a fresh environment and run with a config file:

    conda create -n hipscatalog-gen "python>=3.11"
    conda activate hipscatalog-gen
    pip install hipscatalog-gen

If you do not have Conda yet, install it first using the official docs:
- Conda install guide: https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html
- Miniconda install guide: https://www.anaconda.com/docs/getting-started/miniconda/install

Fetch the example template and adapt it to your catalog:

    curl -O https://raw.githubusercontent.com/linea-it/hipscatalog_gen/main/examples/configs/config.template.yaml
    cp config.template.yaml config.yaml

Run the pipeline:

    hipscatalog-gen --config config.yaml
    # or: python -m hipscatalog_gen.cli --config config.yaml

--------------------------------------------------------------------------------

## Developer Install

For local development (editable install + tooling):

    git clone https://github.com/linea-it/hipscatalog_gen.git
    cd hipscatalog_gen
    conda create -n hipscatalog-gen-dev "python>=3.11"
    conda activate hipscatalog-gen-dev
    pip install -e .[dev]

Optionally expose the env as a Jupyter kernel:

    python -m ipykernel install --user --name hipscatalog-gen --display-name "hipscatalog-gen"

-------------------------------------------------------------------------------

## Configuration

The pipeline is fully configured through a YAML file.

A complete annotated template is provided in ./examples/configs folder as:

- config.template.yaml

When installed from PyPI, download the template directly:

    curl -O https://raw.githubusercontent.com/linea-it/hipscatalog_gen/main/examples/configs/config.template.yaml

To create your own configuration:

    cp config.template.yaml config.yaml

Then edit config.yaml to match your input catalog and selection preferences.
Additional examples are available under ./examples/configs/.

Selection modes live under ``algorithm.selection_mode``:

- ``mag_global``, ``score_global``, ``score_density_hybrid``.
Mode-specific parameters live inside blocks ``algorithm.mag_global``, ``algorithm.score_global``, and
``algorithm.score_density_hybrid`` (with optional shared defaults in ``algorithm.selection_defaults``).

Cluster memory policy (current behavior):

- The pipeline now uses fixed defaults optimized for large catalogs:
  - no persistence of large intermediate DataFrames
  - avoid early large compute materializations whenever possible
- ``cluster.low_memory_mode`` is deprecated (accepted only with warning, no effect).
- ``cluster.persist_ddfs`` and ``cluster.avoid_computes_wherever_possible`` are deprecated and ignored.
- For streamed stage-2 writes (deeper depths), an active ``dask.distributed`` client is required.

-------------------------------------------------------------------------------

## Running

The pipeline can be executed either as a Python library or from the command line.

### Run as a library

    from hipscatalog_gen.config import load_config, load_config_from_dict, display_available_configs
    from hipscatalog_gen.pipeline.main import run_pipeline

    cfg = load_config("config.yaml")
    run_pipeline(cfg)

### Run from the command line

List available selection modes:

    hipscatalog-gen --list-modes

Run with a config file:

    hipscatalog-gen --config config.yaml
    # or: python -m hipscatalog_gen.cli --config config.yaml

No dedicated ``sbatch`` wrapper script is required. For HPC usage, set
``cluster.mode: slurm`` in the YAML and run the same command above.

Validate a config without running:

    hipscatalog-gen --check-config config.yaml

Enable JSON logs (process.jsonl) via CLI flag (when running the pipeline):

    hipscatalog-gen --config config.yaml --json-logs

Summarize an existing telemetry.json:

    hipscatalog-gen --telemetry /path/to/telemetry.json

## Output Structure

Each run generates a HiPS-compliant directory structure under output.out_dir:

- Norder*/Dir*/Npix*.tsv  → Per-depth tiles.
- Norder*/Allsky.tsv      → Optional all-sky tables.
- densmap_o<depth>.fits   → Density maps for all depths up to level_limit.
- Moc.fits / Moc.json     → Multi-Order Coverage maps.
- properties / metadata.xml → HiPS metadata descriptors.
- process.log / arguments  → Run logs and configuration snapshot (optional process.jsonl when `--json-logs`).
- telemetry.json          → Run summary with per-stage durations and input/output counts.
- Existing ``output.out_dir`` causes an error; set ``output.overwrite: true`` to clear it before writing.

-------------------------------------------------------------------------------

## Mode Summary

- **mag_global**: magnitude-complete slices across all depths.
- mag_global hist_peak default bounds: when `adaptive_range=hist_peak` and `mag_min`/`mag_max` are not provided, the histogram range clips the global min/max to [-2, 40] (mag_min clipped to >= -2; mag_max from the peak within [-2, min(global_max, 40)]).
- **score_global**: score-based slices across all depths.
- **score_density_hybrid**: density-driven tiles for depths 1..`density_up_to_depth` (default 4), then score slices for deeper levels.
- For deeper streamed depths, bucket processing runs on Dask workers (`Client.submit`) and keeps the driver lightweight (orchestration only).
- Stream merge uses bounded fan-in (auto-tuned from worker concurrency + `RLIMIT_NOFILE`) to reduce `EMFILE` (`Too many open files`) risk.
- Ordering and ties: `order_desc` controls ascending/descending (default ascending); optional `tie_column` breaks ties before falling back to RA/DEC.
- Invalids: `keep_invalid_values` (per mode or in `selection_defaults`) can map NaN/Inf to a sentinel when `adaptive_range=complete`, sending them to the last slice; rejected for `hist_peak`.

-------------------------------------------------------------------------------

## Development and Contributing

This project follows the LINCC Frameworks Python Project Template.

To set up a development environment:

    pip install -e .[dev]
    pre-commit install
    pytest

Contributions, bug reports, and pull requests are welcome via GitHub Issues: https://github.com/linea-it/hipscatalog_gen/issues

-------------------------------------------------------------------------------

## Acknowledgments

This project acknowledges the foundational work of the **CDS HiPS Catalog Tool** (Hipsgen-cat.jar) developed by the Strasbourg Astronomical Data Center (Unistra/CNRS, 2016), which inspired aspects of the software design.
More information: https://aladin.cds.unistra.fr/hips/HipsCat.gml.

The mag-global mode builds on an idea originally suggested by **Julia Gschwend**.

-------------------------------------------------------------------------------

## Citation

If you use this package in your research, please cite:

Silva, L. L. C., et al. (2026). *hipscatalog-gen: A Python HiPS Catalog Pipeline*.
LIneA – Laboratório Interinstitucional de e-Astronomia.
Available at: https://github.com/linea-it/hipscatalog_gen

-------------------------------------------------------------------------------

## License

This project is licensed under the MIT License. See the LICENSE file for details.
