Metadata-Version: 2.4
Name: carnopy
Version: 0.1.0a1
Summary: Reproducible ML-ready thermodynamic fluid-property datasets from configurable property backends.
Project-URL: Repository, https://github.com/gcalpay/carnopy
Project-URL: Issues, https://github.com/gcalpay/carnopy/issues
Author: Göran Cem Alpay
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: coolprop>=7.2
Requires-Dist: numpy>=2.0
Requires-Dist: pandas>=2.2
Requires-Dist: pyarrow>=16
Requires-Dist: pydantic>=2.8
Requires-Dist: pyyaml>=6.0
Requires-Dist: typer>=0.12
Provides-Extra: all
Requires-Dist: matplotlib>=3.8; extra == 'all'
Provides-Extra: viz
Requires-Dist: matplotlib>=3.8; extra == 'viz'
Description-Content-Type: text/markdown

# Carnopy

> Alpha software: public interfaces and generated schemas may still change
> before the stable `0.1.0` release.

Carnopy is a CLI-first Python package for generating reproducible,
backend-derived thermophysical datasets for machine-learning, surrogate-model,
and engineering workflows.

Carnopy is not a thermodynamic property model. It orchestrates configured
property backends, validates deterministic sampling, preserves failed states as
diagnostics, and emits stable tabular data with provenance. Generated values are
synthetic backend output, not experimental data or backend-independent ground
truth.

Milestone 1 supports pure fluids through CoolProp and three modes:

- `property_table`: temperature-pressure state tables;
- `saturation_table`: saturated-liquid and saturated-vapor endpoint rows;
- `vapor_mass_fraction_table`: two-phase states over vapor mass fraction.

## Contents

- [Installation](#installation)
- [Quick start](#quick-start)
- [Configuration](#configuration)
- [Properties](#properties)
- [Visualization](#visualization)
- [Generated artifacts and provenance](#generated-artifacts-and-provenance)
- [Python API](#python-api)
- [Scientific limitations](#scientific-limitations)
- [Development and contribution](#development-and-contribution)
- [Alpha release procedure](#alpha-release-procedure)

## Installation

After `0.1.0a1` is published to PyPI:

```bash
python -m pip install "carnopy==0.1.0a1"
```

Install optional plotting support:

```bash
python -m pip install "carnopy[all]==0.1.0a1"
```

For an isolated CLI:

```bash
uv tool install "carnopy==0.1.0a1"
uv tool install "carnopy[all]==0.1.0a1"
```

The base package supports generation and validation. The `viz` and `all` extras
install Matplotlib for manual or configured figure generation. PyArrow remains
a core dependency because Parquet is a supported first-class output format.

For repository development:

```bash
uv sync --locked --extra all --group dev
uv run --locked carnopy --help
```

## Quick start

The normal workflow is:

```text
init → edit → optional validate → generate → inspect → optional plot
```

Create a starter configuration:

```bash
carnopy init property_table my-dataset.yaml
```

`init` reads the selected template packaged inside the installed `carnopy`
module and writes a new file at the path you provide. For example, when the
current directory is `/home/cfd/carnopy/`:

```bash
carnopy init property_table property.yaml
```

creates:

```text
/home/cfd/carnopy/property.yaml
```

from the packaged `property_table.yaml` template. It does not modify or move
the packaged template, and it refuses to overwrite an existing
`property.yaml`. A relative output path is resolved from the current working
directory; an absolute path is written exactly where specified.

Available modes:

```text
property_table
saturation_table
vapor_mass_fraction_table
```

Discover backend fluids and semantic properties:

```bash
carnopy fluids
carnopy properties
```

Edit the YAML, optionally validate it, then generate an immutable run:

```bash
carnopy validate my-dataset.yaml
carnopy generate my-dataset.yaml
```

`generate` validates automatically. The separate `validate` command is useful
for scripts and early feedback, but does not evaluate thermodynamic rows.

After generation, inspect the run before choosing a plot:

```bash
carnopy inspect outputs/<run>
```

The inspection lists fluids, sampling levels, emitted properties, compatible
plot kinds, and copyable commands.

To choose a different output root:

```bash
carnopy generate \
  configs/cyclopentane_vapor_fraction_pressure.yaml \
  --out outputs/manual-test
```

The run is created directly under that root. Copy the exact path printed after
`Output directory:`; do not prepend the output root again:

```bash
# Example only; replace this with the exact path printed by your run.
RUN_DIR="outputs/manual-test/20260621T172006Z_vapor_fraction_c8e28e9f"
```

Run names use UTC creation time, a short mode label, and the first eight
hexadecimal characters of the unique `run_id`. Full identities and hashes
remain in `metadata.json`.

Use command-specific help for the complete current interface:

```bash
carnopy --help
carnopy generate --help
carnopy plot --help
```

## Configuration

Schema version 1 requires:

```yaml
schema_version: 1
backend: coolprop
mode: property_table
fluids: [Propane]

grid:
  temperature:
    kind: linspace
    start: 20
    stop: 100
    num: 5
    unit: degC
  pressure:
    kind: linspace
    start: 1
    stop: 20
    num: 5
    unit: bar

properties:
  - specific_enthalpy
  - mass_density

outputs:
  # Omit this section to keep the same default.
  dataset_formats: [csv, parquet]
```

### Modes

`property_table` requires temperature and pressure and generates their Cartesian
product for every selected fluid.

`saturation_table` requires exactly one of temperature or pressure. It computes
the missing saturation coordinate and emits separate saturated-liquid and
saturated-vapor rows.

`vapor_mass_fraction_table` requires vapor mass fraction plus exactly one of
temperature or pressure. Vapor mass fraction is vapor mass divided by total
vapor-plus-liquid mass. Carnopy denotes it by $x_{\mathrm{vap}}$ in figures
and scientific equations while keeping the explicit public field name
`vapor_mass_fraction`. CoolProp's `Q` name remains internal to the adapter.

For a pure fluid at fixed saturation temperature or pressure:

- $x_{\mathrm{vap}}=0$ is the saturated-liquid boundary;
- $x_{\mathrm{vap}}=1$ is the saturated-vapor boundary;
- $0<x_{\mathrm{vap}}<1$ is an equilibrium two-phase mixture state.

The endpoint states have definite backend properties. Near-endpoint values such
as `0.01` and `0.99` are interior mixture states; they supplement rather than
replace the boundaries. For specific enthalpy and specific volume:

```math
h(x_{\mathrm{vap}})
=(1-x_{\mathrm{vap}})h_f+x_{\mathrm{vap}}h_g
```

```math
\frac{1}{\rho(x_{\mathrm{vap}})}
=\frac{1-x_{\mathrm{vap}}}{\rho_f}
+\frac{x_{\mathrm{vap}}}{\rho_g}
```

See the
[CoolProp high-level saturation documentation](https://coolprop.org/coolprop/HighLevelAPI.html#vapor-liquid-and-saturation-states)
for the backend definition of the endpoint states.

### Samplers

| Sampler | Parameters | Behavior |
|---|---|---|
| `explicit` | `values` | Preserves declared order; values must be finite and unique after SI conversion. |
| `linspace` | `start`, `stop`, `num` | Includes both endpoints; supports ascending and descending ranges. |
| `stepspace` | `start`, `stop`, `step` | Includes both endpoints; the endpoint must be reachable. |
| `geomspace` | `start`, `stop`, `num` | Positive physical endpoints; supports either direction. |
| `logspace` | `start_exp`, `stop_exp`, `num`, optional `base` | Samples exponent space; `base` must exceed one. |

Equal sampler bounds are rejected; use `explicit` for one value. Geometric and
logarithmic sampling is not supported for offset Celsius values or vapor mass
fraction. Use Kelvin for geometric temperature grids.

`linspace` uses uniform increments. For example, `start: 1`, `stop: 5`, and
`num: 5` produce `1, 2, 3, 4, 5`. `geomspace` uses uniform ratios and produces
approximately `1, 1.495, 2.236, 3.344, 5` for the same bounds.

### Dataset formats

Select generated table formats independently of the scientific specification:

```yaml
outputs:
  dataset_formats: [csv]
```

Supported values are `csv` and `parquet`. At least one is required. Omitting
`outputs` preserves the default `[csv, parquet]`. Format selection changes the
artifact-generation context and `output_request_id`, but not `spec_id` or
`config.normalized.json`.

### Units

Supported input units:

```text
temperature: K, degC
pressure: Pa, kPa, MPa, bar
vapor_mass_fraction: "1"
```

All backend calls and generated numeric columns use SI. Original units and
sampler definitions remain recorded in metadata.

Validation rejects non-finite values, non-positive pressure, temperatures at or
below absolute zero, vapor mass fractions outside `[0, 1]`, incompatible units,
duplicate canonical fluids, and projected runs above 1,000,000 rows.

Validation proves that a configuration is structurally executable. It does not
promise that every fluid, state, phase, and requested property will be valid.

## Properties

Use `carnopy properties` for the authoritative installed registry.

| Semantic name | Dataset column | Classification |
|---|---|---|
| `specific_enthalpy` | `specific_enthalpy_J_kg` | backend-provided, reference-dependent |
| `specific_entropy` | `specific_entropy_J_kgK` | backend-provided, reference-dependent |
| `specific_internal_energy` | `specific_internal_energy_J_kg` | backend-provided, reference-dependent |
| `mass_density` | `mass_density_kg_m3` | backend-provided |
| `isobaric_specific_heat_capacity` | `isobaric_specific_heat_capacity_J_kgK` | backend-provided |
| `isochoric_specific_heat_capacity` | `isochoric_specific_heat_capacity_J_kgK` | backend-provided |
| `dynamic_viscosity` | `dynamic_viscosity_Pa_s` | backend-provided |
| `kinematic_viscosity` | `kinematic_viscosity_m2_s` | derived from viscosity and density |
| `thermal_conductivity` | `thermal_conductivity_W_mK` | backend-provided |
| `prandtl_number` | `prandtl_number` | backend-provided |
| `speed_of_sound` | `speed_of_sound_m_s` | backend-provided |
| `molar_mass` | `molar_mass_kg_mol` | fluid constant |
| `critical_temperature` | `critical_temperature_K` | fluid constant |
| `critical_pressure` | `critical_pressure_Pa` | fluid constant |
| `triple_point_temperature` | `triple_point_temperature_K` | fluid constant |
| `surface_tension` | `surface_tension_N_m` | mode/region limited |

Derived dependencies may be evaluated internally without being emitted unless
explicitly requested. Fluid constants may be repeated in rows and are also
summarized in metadata.

Milestone 1 uses strict row validity: failure of any required coordinate, phase,
or requested property makes the row invalid. Successfully evaluated values may
remain populated while failed values remain null. Requesting a mode-limited
property such as `surface_tension` over a broad state grid can therefore
invalidate otherwise usable rows.

## Visualization

Visualization is a reproducible view of emitted dataset columns:

- it never calls CoolProp or another thermodynamic backend;
- it never smooths, interpolates, extrapolates, or invents states;
- it preserves invalid and missing gaps;
- it retains markers at emitted samples;
- its identity is separate from scientific dataset identity.

Install `carnopy[all]` or `carnopy[viz]` before plotting.

### Manual plotting

Supported plot kinds:

```text
property-curves
property-heatmap
xy
pv
ts
```

Property curves use discrete, colorblind-safe series colors and markers.
For `property_table`, choose the x-axis explicitly:

```bash
carnopy plot outputs/<property-run> \
  --kind property-curves \
  --property mass_density \
  --x temperature
```

Carnopy connects adjacent valid emitted samples with straight line segments as
visual guides. It does not smooth or evaluate intermediate states. A sparse
series advisory is emitted for connected series with five or fewer samples.
Generate a denser source grid for finer thermodynamic resolution. Use SVG or
PDF for zoom-independent rendering:

```bash
carnopy plot outputs/<run> ... --output figures/plot.svg
carnopy plot outputs/<run> ... --output figures/plot.pdf
```

For `vapor_mass_fraction_table`, vapor mass fraction is the x-axis and the
sampled saturation pressure or temperature defines the series:

```bash
carnopy plot "$RUN_DIR" \
  --kind property-curves \
  --property mass_density \
  --value-scale linear \
  --show
```

Sampled heatmaps use flat, non-interpolated cells and require at least two
unique values on each axis:

```bash
carnopy plot "$RUN_DIR" \
  --kind property-heatmap \
  --property specific_enthalpy \
  --color-scale linear
```

`saturation_table` does not support property heatmaps because it contains only
the two endpoint branches.

Generic x-y plots use numeric semantic fields from emitted columns:

```bash
carnopy plot outputs/<property-run> \
  --kind xy \
  --x specific_enthalpy \
  --y specific_entropy \
  --group-by pressure
```

If more than one independent sampling coordinate remains, `--group-by` must
resolve the ambiguity. Carnopy does not apply hidden grouping precedence.

Conventional thermodynamic diagrams are derived only from emitted columns:

```bash
carnopy plot outputs/<run-with-density> --kind pv
carnopy plot outputs/<run-with-entropy> --kind ts
```

The p-v diagram uses:

```text
specific_volume = 1 / mass_density
```

The T-s diagram uses emitted entropy and temperature and requires recorded
reference-state metadata. Neither command fabricates a saturation dome,
critical point, or missing branch.

Exact filters use canonical SI values and never select a nearest neighbor:

```bash
carnopy plot "$RUN_DIR" \
  --kind property-curves \
  --property mass_density \
  --filter pressure=200000
```

Repeat `--filter` to combine filters with logical AND. Current filter fields are
temperature, pressure, vapor mass fraction, phase, and saturation endpoint.
Repeat `--fluid` to select multiple fluids; each fluid receives its own facet.

`SOURCE` may be a run directory, CSV, or Parquet file. Run directories prefer
Parquet and verify it against `metadata.json`. Standalone saturation and
vapor-quality files may require `--saturation-coordinate pressure` or
`--saturation-coordinate temperature`.

Every export writes an image plus `.plot.json` provenance sidecar under
`figures/` by default. Existing image or sidecar paths are refused.
Finalization uses exclusive same-filesystem hard links: it is no-overwrite-safe,
but the two-file pair is not fully crash-atomic.

### Configured visualization

An optional top-level `visualization` section generates figures after the
immutable dataset run is finalized:

```yaml
visualization:
  format: png
  fluids: [Propane]

  plots:
    - name: density-vs-temperature
      kind: property_curves
      property: mass_density
      x: temperature
      value_scale: linear

    - name: density-map
      kind: property_heatmap
      property: mass_density
      color_scale: log

    - name: enthalpy-entropy
      kind: xy
      x: specific_enthalpy
      y: specific_entropy
      group_by: pressure

    - name: pressure-specific-volume
      kind: pv

    - name: temperature-entropy
      kind: ts
```

Supported formats are `png`, `pdf`, and `svg`. Per-plot `format` and `fluids`
replace their shared values; scales are selected per plot. Per-plot filters are
AND-merged with shared filters, and conflicting values for the same field are
rejected. Plot names must be unique safe filename slugs. Output paths and
interactive display are intentionally not stored in YAML.

Shared or per-plot exact filters use YAML mappings:

```yaml
visualization:
  filters:
    phase: gas
  plots:
    - name: gas-density
      kind: property_curves
      property: mass_density
      x: temperature
      filters:
        pressure: 100000
```

Generate with the default figure root:

```bash
carnopy generate my-dataset.yaml
```

Or select another figure root:

```bash
carnopy generate my-dataset.yaml \
  --out outputs/manual-test \
  --figures-out figures/manual-test
```

Configured figures are written to:

```text
<figures-root>/<run-directory-name>/
├── <plot-name>.<format>
├── <plot-name>.plot.json
└── visualization-report.json
```

The same YAML requests can be applied later to an existing immutable run. The
file may be a full Carnopy configuration or a small file containing only a
top-level `visualization:` section:

```bash
carnopy plot outputs/<run> \
  --config plots.yaml \
  --figures-out figures
```

Batch plotting accepts run directories, not standalone CSV/Parquet files.
Scientific generation fields in a full config are ignored; requests are
validated against the actual emitted run columns. Manual plot options cannot be
combined with `--config`.

Plots execute independently after dataset finalization. A failed plot preserves
the immutable run and any successful figures, records outcomes in the report,
and makes the CLI exit with code `1`. A zero-valid-row dataset retains exit code
`3` and records configured plots as skipped.

Visualization settings do not change `config.normalized.json`, `spec_id`, or
`generation_context_id`. They receive their own
`visualization_request_id = viz-<sha256>`. Exact YAML bytes still affect the raw
configuration hash.

## Generated artifacts and provenance

Each immutable run contains the selected dataset files plus mandatory
provenance artifacts:

```text
outputs/<run>/
├── dataset.csv          # when requested
├── dataset.parquet      # when requested
├── config.original.yaml
├── config.normalized.json
├── metadata.json
└── report.json
```

Runs are staged and then finalized atomically as one directory. Existing final
or staging paths are never overwritten.

Identity layers:

- `spec_id`: canonical executable scientific specification;
- `generation_context_id`: specification plus software and artifact context;
- `output_request_id`: canonical dataset serialization request;
- `run_id`: one UUID4 execution attempt;
- artifact hashes: exact emitted bytes;
- `visualization_request_id`: normalized visualization request, independent
  from dataset identity.

Configuration provenance includes SHA-256 hashes of exact source YAML and
canonical materialized SI configuration bytes. Metadata records software
versions, reference-state policy, canonical fluids and properties, sampling,
failure counts, units, fluid constants, and artifact hashes. Carnopy does not
store the host source-config path.

Parquet schema metadata includes the dataset schema version and unit mapping.
Figures are derived artifacts outside the run and are not added to immutable
dataset artifact hashes.

## Python API

```python
from carnopy import generate_dataset, load_config, validate_config

loaded = load_config("my-dataset.yaml")
validation = validate_config("my-dataset.yaml")
result = generate_dataset(
    "my-dataset.yaml",
    output_root="outputs",
    figures_root="figures",
)
```

When configured visualization exists, `result.visualization` contains its
request ID, status, figure directory, report path, and outcome counts.
`result.dataset_formats` and `result.output_request_id` describe the selected
table serialization independently of the scientific `spec_id`.

Manual plotting:

```python
from carnopy.visualization import (
    plot_property_heatmap,
    plot_thermodynamic_diagram,
    plot_xy,
)

heatmap = plot_property_heatmap(
    "outputs/<run>",
    property_name="mass_density",
)

xy = plot_xy(
    "outputs/<run>",
    x="specific_enthalpy",
    y="specific_entropy",
    group_by="pressure",
)

pv = plot_thermodynamic_diagram("outputs/<run>", kind="pv")
```

The returned Matplotlib figure represents an image that has already been
exported. Modifying it does not update the image or provenance sidecar.

## Scientific limitations

- CoolProp is the only backend in Milestone 1.
- Pure fluids only; mixtures are deferred.
- Generated data is backend output, not experimental evidence.
- All backend calls and generated numeric columns use SI.
- Specific enthalpy, entropy, and internal energy depend on reference state.
- Carnopy resets every requested fluid to CoolProp `DEF` before generation and
  records that policy.
- CoolProp reference-state mutation is process-global; concurrent embedded use
  with unrelated CoolProp calculations is unsupported in Milestone 1.
- Release regression tests compare finalized Parquet values with direct
  CoolProp calls for representative states in all three modes.
- Separate sanity checks require the generated normal boiling points of Propane
  and Cyclopentane at `101325 Pa` to remain within the uncertainty intervals
  published by the NIST Chemistry WebBook. These checks do not establish
  universal experimental accuracy.
- Absolute reference-dependent values are not directly comparable across
  different reference conventions.
- Visualization reads emitted columns only and is not a second property
  evaluation layer.
- ORC generation, additional backends, ML training, GUI, web services,
  databases, and mixture models are deferred.

Post-alpha work may add an optional cycle-feasibility subsystem that produces
traceable screening datasets without turning the property generator into a
hidden process simulator. An ORC/TFC contract must explicitly include source
and sink profiles, pinch/approach temperatures, pressure losses, component
efficiencies, subcooling and superheat margins, cavitation/NPSH constraints,
minimum turbine-exhaust quality, and critical/maximum operating limits.
Saturated liquid alone is not a pump cavitation margin, and turbine discharge
need not universally have vapor mass fraction one.

Official backend references:

- https://coolprop.org/coolprop/
- https://coolprop.org/coolprop/HighLevelAPI.html
- https://github.com/CoolProp/CoolProp

## Development and contribution

Carnopy uses a `src/` layout, Hatchling, standalone uv, Ruff, strict mypy, and
pytest. `pyproject.toml` and `uv.lock` are authoritative.

Normal development:

```bash
uv sync --locked --extra all --group dev
```

Release-readiness tooling:

```bash
uv sync --locked --extra all --group dev --group release
```

Quality gate:

```bash
uv lock --check
uv run --locked ruff check .
uv run --locked ruff format --check .
uv run --locked mypy src/carnopy
uv run --locked pytest
uv run --locked python scripts/preflight.py
uv pip check --python .venv/bin/python
```

Keep changes small and explicit. Public configuration names, semantic property
names, SI dataset columns, failure codes, metadata fields, and identity rules
are compatibility contracts. Tests use temporary output directories and do not
commit generated datasets or figures.

The test count is not a quality target. The suite separates configuration,
sampling, three thermodynamic modes, diagnostics, provenance, visualization,
CLI behavior, packaging, and release automation. New tests should protect a
distinct contract or regression and use parametrization instead of duplicating
equivalent cases.

Contributor and coding-agent rules, architecture constraints, commit
conventions, and release-maintainer safeguards are in
[AGENTS.md](https://github.com/gcalpay/carnopy/blob/main/AGENTS.md).

## Alpha release procedure

Carnopy `0.1.0a1` is intended to be a functional alpha, not a placeholder
package. The PyPI name is claimed only after production PyPI accepts the
distribution.

Before release:

1. Make `gcalpay/carnopy` public.
2. Enable GitHub secret scanning and push protection.
3. Create a protected GitHub environment named `pypi` with a required human
   reviewer and a deployment tag rule matching `v*`.
4. Register a pending Trusted Publisher on production PyPI:

```text
Project:      carnopy
Owner:        gcalpay
Repository:   carnopy
Workflow:     publish.yml
Environment:  pypi
```

Pending publishers do not reserve the project name. Confirm production name
availability immediately before tagging.

Release verification:

```bash
uv sync --locked --extra all --group dev --group release
uv lock --check
uv run --locked ruff check .
uv run --locked ruff format --check .
uv run --locked mypy src/carnopy
uv run --locked pytest
uv run --locked python scripts/preflight.py
uv run --locked --group release python -m build
uv run --locked --group release python -m twine check dist/*
uv run --locked python scripts/check_distribution.py dist/*
uv pip check --python .venv/bin/python
```

The build command uses an isolated build environment by default and installs
the declared build backend there. The development environment therefore does
not need to be changed solely to run a build. Use the ignored repository-local
`prerelease/` directory for a non-destructive rehearsal when an existing
`dist/` must be preserved:

```bash
uv run --locked --group release python -m build --outdir prerelease
uv run --locked --group release python -m twine check prerelease/*
uv run --locked python scripts/check_distribution.py prerelease/*
```

Final approved artifacts are built into `dist/` for inspection, hashing, and
publication. Carnopy build artifacts should not be written outside the
repository.

The human creates and pushes the release tag:

```bash
git tag -a v0.1.0a1 -m "Release carnopy 0.1.0a1"
git push origin v0.1.0a1
```

The publishing workflow tests Python 3.10–3.13, builds one wheel/sdist pair
once, verifies and hashes it, waits for production approval, publishes the
verified files to PyPI, then downloads and smoke-tests the published release.
Only the production publish job receives `id-token: write`; no long-lived index
token or `skip-existing` behavior is used.

Never rebuild or republish changed files under an uploaded version. Any payload
change requires `0.1.0a2` or later. Never move a pushed release tag or delete a
release to reuse its version.

Official publishing references:

- https://docs.pypi.org/trusted-publishers/creating-a-project-through-oidc/
- https://docs.pypi.org/trusted-publishers/using-a-publisher/
- https://packaging.python.org/en/latest/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/

## License

Carnopy is distributed under the MIT License. See `LICENSE`.
