Metadata-Version: 2.4
Name: punter-hep
Version: 0.3.0
Summary: Theory Nuisance Parameter estimates of perturbative QCD missing higher-order uncertainties
Author: Matthew A. Lim, Rene Poncelet
License: MIT License
        
        Copyright (c) 2025 RenePoncelet
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Keywords: hep-ph,qcd,uncertainty,theory nuisance parameters,phenomenology
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Physics
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: scipy
Provides-Extra: plot
Requires-Dist: matplotlib; extra == "plot"
Provides-Extra: yaml
Requires-Dist: PyYAML; extra == "yaml"
Provides-Extra: yoda
Requires-Dist: babyyoda>=0.0.8; extra == "yoda"
Provides-Extra: qcd
Requires-Dist: qcdevol; extra == "qcd"
Provides-Extra: full
Requires-Dist: matplotlib; extra == "full"
Requires-Dist: PyYAML; extra == "full"
Requires-Dist: qcdevol; extra == "full"
Requires-Dist: babyyoda>=0.0.8; extra == "full"
Provides-Extra: dev
Requires-Dist: build; extra == "dev"
Requires-Dist: babyyoda>=0.0.8; extra == "dev"
Requires-Dist: matplotlib; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Requires-Dist: PyYAML; extra == "dev"
Requires-Dist: qcdevol; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: types-PyYAML; extra == "dev"
Dynamic: license-file

# PUNTer

PUNTer is a small analysis tool for estimating missing higher-order uncertainties (MHOU) in fixed-order perturbative QCD using theory nuisance parameters (TNPs).

The implementation is based on:

- Matthew A. Lim and Rene Poncelet, *Robust estimates of theoretical uncertainties at fixed-order in perturbation theory*, arXiv:2412.14910

## Motivation

The code provides estimates of missing higher-order QCD uncertainties for fixed-order calculations, following the TNP approach described in arXiv:2412.14910. In practical terms:

- a fixed-order prediction is written as a perturbative expansion in alpha_s
- the unknown next term is modelled using nuisance parameters
- the nuisance parameters multiply lower-order information with a simple polynomial dependence on the kinematics
- varying those nuisance parameters gives an uncertainty band and a covariance matrix.

PUNTer takes as input higher order calculations produced by external codes and returns a TNP-based MHOU estimate.

## What The Code Does

The main entry point is [`punter.py`](punter.py). It can:

- read fixed-order predictions from several external formats
- build the perturbative expansion order by order
- evolve `alpha_s` from `M_Z` to bin scales, or keep it fixed
- model the next missing term using Chebyshev, Legendre, or Bernstein bases
- save central values, uncertainty bands, coefficient matrices, covariance matrices, and replicas
- plot the result with bin-edge aware step plots
- run from CLI flags or from a YAML config file
- process one observable or a batch of observables in one run

## Supported Input Formats

The current implementation supports:

- `yoda`
- `stripper`
- `n3loxs`
- `matrix`
- `nnlojet`
- `mcfm`
- `geneva`

Format detection is automatic in many common cases, but can be forced with `--format`.

## Format Conventions

PUNTer normalizes several external fixed-order formats into a common internal representation, but the way observables are identified differs by provider.

- `yoda`: `target` is the YODA object path, for example `/MY_ANALYSIS/PT_JET`
- `stripper`: `target` is the `<observable><description>` text in the XML file
- `matrix`: `target` is matched against the observable name embedded in the filename, and `run_mode` must match the MATRIX run suffix such as `LORUN`, `NLORUN`, or `NNLORUN`
- `n3loxs`: files are binned by `qmin` and no `target` is required
- `nnlojet`: each file corresponds to one histogram; the observable name and perturbative order are read from the filename itself
- `mcfm`: `target` is matched against the TopDrawer plot title
- `geneva`: `target` is matched against the distribution name inside the XML/CDATА payload

For `nnlojet` in particular, the filename convention matters. The loader expects the perturbative order and observable name to come from the filename, for example:

- `proc.LO.ptj1_a.s1.dat`
- `proc.NLO.LO.ptj1_a.s1.dat`
- `proc.NNLO.NLO.LO.ptj1_a.s1.dat`

In that scheme, `ptj1_a` is the observable name and must match `target` exactly. This avoids accidental matches such as `ptj1_a` versus `ptj1_2plus_a`.

## Installation

The project now has a standard `pyproject.toml` and can be installed as a package.

Core dependencies:

- `numpy`
- `scipy`

Optional dependencies:

- `matplotlib` for plotting
- `PyYAML` for YAML config files
- `qcdevol` for `alpha_s` running
- `babyyoda` for pip-installable YODA-compatible parsing

A minimal editable install is:

```bash
python -m pip install -e .
```

For optional plotting, YAML config support, and external parser/running helpers:

```bash
python -m pip install -e .[full]
```

For YODA support specifically:

- `python -m pip install -e .[yoda]` installs `babyyoda`, which provides a pip-installable YODA-compatible reader
- if the official HEP YODA Python bindings are already installed in your environment, PUNTer will use them directly
- PUNTer intentionally does not depend on the PyPI package named `yoda`, because that import name can resolve to an unrelated project

On PyPI, the distribution name is `punter-hep`:

```bash
python -m pip install punter-hep
```

This installs the `punter` console command and the `punter` Python package. Legacy usage with `python punter.py` still works as a compatibility shim.

## Package Layout

The repository is now organized as a package:

- [`punter/core.py`](punter/core.py): `Punter` class and core perturbative computation
- [`punter/api.py`](punter/api.py): public library entrypoints
- [`punter/formats.py`](punter/formats.py): normalized format loaders
- [`punter/io/parsers.py`](punter/io/parsers.py): raw file-format parsers
- [`punter/cli.py`](punter/cli.py): command-line and YAML-driven execution
- [`punter/types.py`](punter/types.py): typed public data objects
- [`punter/__main__.py`](punter/__main__.py): `python -m punter`
- [`punter.py`](punter.py): compatibility entrypoint
- [`pyproject.toml`](pyproject.toml): package metadata and console script definition

## Public Python API

The package now exposes a small typed library API intended to be more stable than the internal implementation details.

Main public objects:

- `FitResult`
- `ObservableInput`
- `RunConfig`
- `PredictionResult`
- `ReplicaResult`

Main public functions:

- `create_punter(...)`
- `fit_tnp(...)`
- `generate_replicas(...)`
- `load_yoda_observable(...)`
- `load_observable(...)`
- `run_tnp(...)`

Example:

```python
from punter import RunConfig, load_yoda_observable, run_tnp

observable = load_yoda_observable(
    ["results/lo.yoda", "results/nlo.yoda", "results/nnlo.yoda"],
    "/MY_ANALYSIS/PT_JET",
    cumulative=True,
)

result = run_tnp(
    observable,
    RunConfig(
        coupling=0.118,
        model="Bernstein",
        degree=2,
        theta_range=1.0,
    ),
)

print(result.central)
print(result.covariance)
```

Library-side fit example:

```python
from punter import RunConfig, fit_tnp, generate_replicas, load_observable

observable = load_observable(
    ["results/lo.yoda", "results/nlo.yoda"],
    "yoda",
    target="/MY_ANALYSIS/PT_JET",
    cumulative=True,
)

fit = fit_tnp(
    observable,
    truth_values=[...],  # one value per bin
    truth_errors=[...],  # one positive uncertainty per bin
    config=RunConfig(coupling=0.118, model="Chebyshev", degree=1),
)

replicas = generate_replicas(
    observable,
    RunConfig(coupling=0.118, n_replicas=50, seed=7),
)
```

At the moment, the normalized loader path is implemented first for YODA inputs. Other format-specific loaders still primarily live behind the existing `Punter.from_*` classmethods.
The normalized loader path currently supports:

- `yoda`
- `stripper`
- `matrix`
- `n3loxs`
- `nnlojet`
- `mcfm`
- `geneva`

The CLI now uses this normalized loader layer for all supported formats before constructing the computation engine.

## CI/CD

The repository includes GitHub Actions workflows for validation and releases:

- `.github/workflows/ci.yml` runs linting, mypy, the unit test suite on Python 3.10 to 3.12, and package build validation on pushes and pull requests
- `.github/workflows/release.yml` builds release artifacts for version tags matching `v*`, publishes a GitHub Release with the built wheel and source tarball, and publishes to PyPI when the `PYPI_API_TOKEN` repository secret is configured

Recommended release flow:

1. Bump the package version.
2. Push a tag such as `v0.3.0`.
3. Let the release workflow build and attach the artifacts.
4. Configure `PYPI_API_TOKEN` if you want the same tagged release to publish automatically to PyPI.

## Basic Usage

### Command-line

Single observable:

```bash
punter results/*.yoda \
  --format yoda \
  --target /MY_ANALYSIS/PT_JET \
  --model Bernstein \
  --degree 2 \
  --save \
  --plot
```

Process multiple observables in one call:

```bash
punter results/*.yoda \
  --format yoda \
  --targets /MY_ANALYSIS/PT_JET /MY_ANALYSIS/M_JJ \
  --model Chebyshev \
  --save \
  --plot
```

Save seeded replicas:

```bash
punter input_qmin20.txt \
  --format n3loxs \
  --save-replicas \
  --n-replicas 200 \
  --seed 12345
```

Show all CLI options:

```bash
punter --help
```

### YAML Config

You can also drive runs from YAML.

Single run:

```yaml
files:
  - data/result.yoda
format: yoda
target: /MY_ANALYSIS/PT_JET
coupling: 0.118
model: Bernstein
degree: 2
save: true
plot: true
save_cov: true
save_replicas: true
n_replicas: 100
seed: 42
```

Run with:

```bash
punter --config run.yaml
```

YAML configs are validated before any files are loaded. In particular:

- unknown keys are rejected, so typos like `sav` fail immediately
- `defaults` must be a mapping and `runs` must be a list of mappings
- `files` and `targets` must be non-empty lists of strings
- scalar fields such as `coupling`, `mZ`, `theta_range`, `degree`, `n_replicas`, and `seed` are type-checked
- `format`, `model`, and `order` are checked against the supported choices
- `target` and `targets` cannot both be present in the same merged run

Supported YAML run keys are:

```text
files, format, coupling, mZ, target, targets, cumulative, run_mode,
model, degree, order, alphas_run_order, fix_scale, fit, save, save_coeffs, save_cov,
save_replicas, plot, show_plot, log_y, n_replicas, theta_range,
seed, output_dir, plot_title
```

Batch config:

```yaml
defaults:
  files:
    - data/result.yoda
  format: yoda
  coupling: 0.118
  model: Chebyshev
  degree: 2
  save: true

runs:
  - target: /MY_ANALYSIS/PT_JET
    plot: true

  - targets:
      - /MY_ANALYSIS/M_JJ
      - /MY_ANALYSIS/DELTA_Y
    save_cov: true
    save_replicas: true
    n_replicas: 50
    seed: 7
```

## Main Options

Physics and model setup:

- `--coupling`: value of `alpha_s(M_Z)`
- `--mZ`: reference scale for the input coupling
- `--fix-scale`: disable running and use a fixed coupling
- `--order {NLO,NNLO,N3LO}`: truncate to a requested perturbative order
- `--alphas-run-order {0,1,2,3}`: loop order for `alpha_s` running
- `--model {Chebyshev,Chebyshev0,Legendre,Legendre0,Bernstein}`
- `--degree`: polynomial degree

Outputs:

- `--save [FILE]`: save central values and TNP uncertainty
- `--save-coeffs [FILE]`: save coefficient matrix
- `--save-cov [FILE]`: save covariance matrix
- `--save-replicas [FILE]`: save replicas
- `--plot [FILE]`: save a plot
- `--show-plot`: display the plot interactively
- `--output-dir DIR`: directory for auto-generated filenames

Replica generation:

- `--n-replicas`: number of replicas
- `--seed`: random seed for reproducibility

Batch/config:

- `--targets`: run multiple observables in one CLI invocation
- `--config`: read one or more runs from YAML

## Output Files

By default, output filenames are generated automatically with the pattern:

```text
punter_<format>_<target>_<suffix><extension>
```

where:

- `<format>` is the sanitized input format name
- `<target>` is the sanitized observable name, omitted when no target exists
- `<suffix>` is one of `results`, `coeffs`, `covariance`, `replicas`, or `plot`
- `<extension>` is `.csv` for tabular outputs and `.png` for plots

Sanitization rules for `<format>` and `<target>` are:

- non-alphanumeric characters are replaced with `_`
- leading and trailing `_` are stripped
- the result is lower-cased

Examples:

- `punter_yoda_my_analysis_pt_jet_results.csv`
- `punter_yoda_my_analysis_pt_jet_covariance.csv`
- `punter_yoda_my_analysis_pt_jet_replicas.csv`
- `punter_yoda_my_analysis_pt_jet_plot.png`

If an output flag is given an explicit filename, that filename is used as-is.

For multi-target runs:

- automatic filenames are generated independently per target
- if a user supplies a single explicit filename, PUNTer appends `_<sanitized-target>` before the file extension
- if a user supplies a template containing `{target}`, `{format}`, or `{suffix}`, those placeholders are substituted directly

Examples:

- `--save` with target `/REF/HISTO_A` becomes `punter_yoda_ref_histo_a_results.csv`
- `--save output.csv` with two targets becomes `output_ref_histo_a.csv` and `output_ref_histo_b.csv`
- `--save results_{target}_{suffix}.csv` becomes `results_ref_histo_a_results.csv`

CSV output schemas are part of the supported interface.

`results` CSV:

- columns: `BinLow,BinHigh,Scale,CentralValue,ErrorPlus,ErrorMinus`
- one row per bin

`coeffs` CSV:

- columns: `BinLow,BinHigh,Scale,Theta_0,...,Theta_{N-1}`
- one row per bin
- `N` is the number of nuisance parameters for the chosen model

`covariance` CSV:

- columns: `Bin_0,...,Bin_{n_bins-1}`
- one row per bin
- the file stores the full bin-by-bin covariance matrix

`replicas` CSV:

- columns: `BinLow,BinHigh,Scale,Replica_0,...,Replica_{n_replicas-1}`
- one row per bin

Saved tabular outputs always include bin-edge information for `results`, `coeffs`, and `replicas`. `covariance` is matrix-shaped and therefore uses bin-index headers instead.

## Internal Model Choices

The code currently supports several polynomial bases for the kinematic dependence of the missing term:

- `Bernstein`
- `Chebyshev`
- `Chebyshev0`
- `Legendre`
- `Legendre0`

The `0` variants include an additional order-independent polynomial component. In the paper, Bernstein and Chebyshev parameterisations are used in different examples, with `k = 2` being the default choice studied there. The default CLI setup in this repository also uses polynomial degree `2`.

## Notes On Coupling Running

If bin scales are available and `--fix-scale` is not used, `punter` evolves `alpha_s(M_Z)` to the bin scales.

- If `qcdevol` is installed, it is used
- otherwise the internal running-coupling implementation is used

For observables that look dimensionless from their name, the CLI applies a fixed-scale heuristic automatically unless `--fix-scale` was already requested explicitly.

## Development And Tests

Run the unit tests with:

```bash
python -m unittest -q
```

The repository also ships real parser fixtures under [`example-outputs`](example-outputs). These are used in regression tests to verify that parsing works against representative outputs from external fixed-order codes, not just synthetic minimal examples.

For local development tooling:

```bash
python -m pip install -e .[dev]
```

Configured tools:

- `ruff` for linting and import ordering on the stabilized public API surface
- `mypy` for gradual type checking of the stabilized public API surface
- `build` for packaging validation

Convenience commands are provided in [`Makefile`](Makefile):

```bash
make test
make lint
make typecheck
make check
make build
make validate-package
```

The current tool configuration lives in [`pyproject.toml`](pyproject.toml).
At the moment, type checking covers the public API, computation, CLI, and raw parser layers. Linting is still intentionally focused on the stabilized public surface rather than the full legacy codebase.

`make validate-package` builds both sdist and wheel artifacts, installs the wheel into a clean local virtual environment, and smoke-tests `import punter` plus the `punter --help` entrypoint.

## Versioning

The package version is exposed as `punter.__version__` and comes from [`punter/_version.py`](punter/_version.py). Release history is tracked in [`CHANGELOG.md`](CHANGELOG.md).

## Citation

If you use this code in scientific work, please cite the paper it is based on:

```bibtex
@article{Lim:2024nsk,
    author = "Lim, Matthew A. and Poncelet, Rene",
    title = "{Robust estimates of theoretical uncertainties at fixed-order in perturbation theory}",
    eprint = "2412.14910",
    archivePrefix = "arXiv",
    primaryClass = "hep-ph",
    reportNumber = "IFJPAN-IV-2024-15",
    doi = "10.1103/7g5k-4y3v",
    journal = "Phys. Rev. D",
    volume = "112",
    number = "11",
    pages = "L111901",
    year = "2025"
}
```

## Repository Status

This repository is now installable as a package and has a stabilized public API surface, typed core/CLI/parser layers, documented output contracts, and regression tests against both synthetic inputs and real example outputs. It is still best viewed as an actively evolving research codebase rather than a mature long-term-stability library.

What is relatively stable today:

- the public Python entrypoints in [`punter/api.py`](punter/api.py)
- the normalized loaders in [`punter/formats.py`](punter/formats.py)
- the command-line interface in [`punter/cli.py`](punter/cli.py)
- the typed data objects in [`punter/types.py`](punter/types.py)
- the documented output filename rules and CSV schemas in this README
- the format conventions documented above, including NNLOJET filename-based observable matching

What should still be treated as implementation detail:

- internal numerical organization inside [`punter/core.py`](punter/core.py)
- parser details beyond the documented input conventions and tested fixture coverage
- undocumented behavior that is not described in this README or covered by tests

The most reliable way to understand exact current behavior is:

- the CLI help in [`punter.py`](punter.py)
- the unit tests in [`test_punter.py`](test_punter.py)
- the shipped parser fixtures in [`example-outputs`](example-outputs)
- the methodology paper at arXiv:2412.14910
