Metadata-Version: 2.4
Name: nexuscreator
Version: 2.0.0
Summary: Convert data + NeXus definitions to NeXus HDF5, with plugins
Author-email: Hector Perez Ponce <hector.perez_ponce@helmholtz-berlin.de>
License-Expression: Apache-2.0
Project-URL: Repository, https://codebase.helmholtz.cloud/hzb/research_data_management/nexuscreator
Project-URL: Bug Tracker, https://codebase.helmholtz.cloud/hzb/research_data_management/nexuscreator/-/issues
Keywords: NeXus,HDF5,scientific data,neutron,synchrotron,EIS,batteries
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Physics
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE
Requires-Dist: h5py
Requires-Dist: numpy
Requires-Dist: matplotlib
Requires-Dist: pyyaml
Dynamic: license-file

# NeXusCreator

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.19600002.svg)](https://doi.org/10.5281/zenodo.19600002)

NeXusCreator converts experimental data and NeXus definition files into valid NeXus (`.nxs`) HDF5 files. It also supports generating NeXus-definition templates (`.nxd`) from input data. A lightweight plugin system powers both generation and parsing for formats like SPEC and DTA/DAT, including a batteries folder workflow.

## Features
- **High-performance conversion** from `.nxd` + input data to `.nxs` (HDF5) files
- **Optimized generation** of `.nxd` from input (e.g., SPEC file or DTA/DAT folder)
- **Efficient per-scan output** for SPEC (`-f`) plus a master file with external links
- **Batteries workflow** (folder of DTA/DAT) with **fast combined parsing**
- **Performance-optimized** temperature `.dat` variables written under `/entry/experiments/temperature`
- **Fast template expansion** for scan groups in `.nxd` via `@scan_template`/placeholders
- **Optional high-speed** `punx` validation after `.nxs` generation
- **Rapid CSV export** of dataset values from `.nxs`/HDF5 inputs
- **Extensible plugin system** for generators and parsers with **minimal overhead**
- **Memory-efficient** processing of large datasets and multi-scan files

## Installation
- Python 3.9+
- Recommended: a virtual environment

Install from this repo:

```
python3 -m venv .venv && source .venv/bin/activate
python3 -m pip install .
```

This installs console scripts:
- `nexuscreator` (preferred)
- `nxc` (alias)

To develop/run from source without installing, you can still invoke the CLI directly:

```
python3 NeXusCreator.py -h
```

## Quick Start

1) Generate a NeXus-definition (`.nxd`) from a SPEC file:

```
nexuscreator -g out.nxd -i data.spec
```

2) Generate a single-scan template from SPEC for later multi-scan conversion:

```
nexuscreator -g template.nxd -i data.spec -t
```

3) Generate a `.nxd` from a batteries folder (DTA/DAT):

```
nexuscreator -g out.nxd -i /path/to/folder -b batteries

# Directory generation modes
# Single combined .nxd for entire folder
nexuscreator -g out.nxd -i /path/to/folder --single-file

# One .nxd per input file (.dta and .dat)
nexuscreator -g out_prefix_ -i /path/to/folder --multi-file
```

4) Convert an operando EIS folder directly to NeXus (no `.nxd` required):

```
# Auto-generate and convert in one step
nexuscreator -i /path/to/eis_folder -b operando_eis -o eis.nxs

# Save the generated definition for reuse
nexuscreator -g eis.nxd -i /path/to/eis_folder -b operando_eis

# Generate a single reusable template (one entry per file_name)
nexuscreator -g eis_template.nxd -i /path/to/eis_folder -b operando_eis -t
```

The folder must contain Gamry `.DTA` / `_Raw.DTA` file pairs following the
naming convention `EIS_{CH|DIS}_#cycle_#file` for operando measurements or
`EIS_{CH|DIS}-static_#file` for static (open-circuit) measurements.

4) Generate YAML instead of .nxd (with -g):

```
# Single file → YAML
nexuscreator -g out_dir/ -i data.spec --yaml

# Folder (multi-file) → YAML per input
nexuscreator -g out_dir/ -i /path/to/folder --multi-file --yaml

# Folder (single-file) → one YAML file
nexuscreator -g out_dir/ -i /path/to/folder --single-file --yaml
```

5) Generate a `.nxd` from an existing HDF5/NeXus file:

```
nexuscreator -g out.nxd -i data.nxs --hdf5-option links   # external links (default)
nexuscreator -g out.nxd -i data.nxs --hdf5-option extract # placeholders + dictionary
```

6) Convert using a NeXus definition and data input:

```
# Single output
nexuscreator -n def.nxd -i data.spec -o out.nxs

# One file per scan + master file with external links
nexuscreator -n def.nxd -i data.spec -o out.nxs -f

# DTA/DAT single file
nexuscreator -n def.nxd -i data.dta -o out.nxs

# Batteries folder
nexuscreator -n def.nxd -i /path/to/folder -b batteries -o out.nxs
```

Useful options by when they are relevant:

- Always relevant:
  `-i/--input`, `-o/--output_path`, `-b/--beamline_name`
- When converting (`-n`):
  `-n/--nexus_definition`, `-f/--file_per_scan`, `-I/--icat_proposal_number`, `--auto-generate-nxd`, `--pair-dta-raw`
- When generating (`-g`):
  `-g/--generate_nexus_definition`, `-t/--template`, `--single-file`, `--multi-file`, `--yaml`, `--hdf5-option`
- For directory/batch processing:
  `-r/--recursive`, `--glob`, `--glob-spec`, `--glob-dta`, `--dry-run`, `--summary-only`, `--no-group-dta-folders`
- For metadata/schema placement:
  `--metadata-csv`, `--jsonld-structure`, `--nxdl-root`, `--app-def`, `--export-vars-csv`, `-d/--dictionary`, `-D/--debug`
- For validation/CSV value export:
  `--validate`, `--export-values-csv`, `--export-values-prefix`, `--csv-delimiter`

`--hdf5-option` modes:
- `links` (or `1`): build definitions with external links to source HDF5/NeXus datasets (default for `-g`).
- `extract` (or `2`): extract datasets into variables/placeholders (default for `-n`).

See more detailed examples in `docs/source/usage.md` (or build the Sphinx docs for HTML output).

## Citation

If you use NeXusCreator, cite the archived release:

Perez Ponce, H. (2026). NeXusCreator (2.0.0). Zenodo. https://doi.org/10.5281/zenodo.19600002

Machine-readable citation metadata is available in `CITATION.cff`.

## Documentation

- Read the Docs (latest): https://nexuscreator.readthedocs.io/en/latest/
- NeXusCreator Assistant (Custom GPT): https://chatgpt.com/g/g-6979cafeeca48191af6a9027bba4e2d8-nexuscreator-assistant
- Source files live under `docs/source` and are built with Sphinx (MyST Markdown).
- NeXus description syntax reference: `docs/source/nexus-description-syntax.md` (Sphinx source) and `docs/nexus-description-syntax.md` (plain Markdown copy).
- Recent changes: `docs/source/whats-new.md`
- Install doc dependencies: `python -m pip install -r dev-requirements.txt`
- Build HTML: `make docs` (renders into `docs/_build/html`)

When `sphinx-build` is not available locally, install it via the developer requirements above.
Open `_build/html/index.html` in a browser to browse the rendered documentation.

## Command Reference (CLI)

Run `python3 NeXusCreator.py help` or `python3 NeXusCreator.py -h` for the latest usage.

Always relevant
- `help`, `-h, --help`: show help
- `-v, --version`: print version
- `--license`: print full Apache-2.0 license text
- `--notice`: print third-party notices
- `--list-beamlines`: list accepted values for `-b/--beamline_name`
- `-i, --input PATH`: input file or directory
- `-o, --output_path PATH`: output file or directory
- `-b, --beamline_name NAME`: beamline context (for example `ikft`, `batteries`, `peaxis`)

Relevant when converting to `.nxs` (`-n`)
- `-n, --nexus_definition FILE`: convert using a NeXus definition
- `-f, --file_per_scan`: for SPEC, write one `.nxs` per scan and a master
- `-I, --icat_proposal_number NUM`: ICAT proposal subfolder in output
- `--auto-generate-nxd`: with directory input, auto-generate `.nxd` per file then convert
- `--pair-dta-raw`: for single `.dta`, combine with sibling `*_raw.dta` (or inverse)

Relevant when generating definitions (`-g`)
- `-g, --generate_nexus_definition FILE`: build a `.nxd` from input
- `-t, --template`: emit a single-scan template
- `--single-file`: with directory input, generate one combined definition
- `--multi-file`: with directory input, generate one definition per input file
- `--yaml`: write YAML (`.yaml`) instead of `.nxd`
- `--hdf5-option MODE`: `links` (default for `-g`) or `extract` (default for `-n`), accepts `1/2`

Relevant for directory/batch processing
- `-r, --recursive`: scan directories recursively
- `--glob PATTERN`: filter scanned files (for example `*.spec`)
- `--glob-spec PATTERN`: SPEC-only filter (overrides `--glob` for SPEC)
- `--glob-dta PATTERN`: DTA/DAT-only filter (overrides `--glob` for DTA/DAT)
- `--dry-run`: list matches and planned outputs without writing
- `--summary-only`: with `--dry-run`, print only the summary line
- `--no-group-dta-folders`: process DTA/DAT individually instead of per-folder grouping

Relevant for metadata/schema-guided placement
- `--metadata-csv FILE`: enrich variables (`variable_name, variable_description, units`)
- `--jsonld-structure FILE`: JSON-LD descriptor for parser/generator behavior
- `--nxdl-root PATH`: path to NXDL definitions for placement guidance
- `--app-def NAME`: preferred NXDL application (for example `NXxas`)
- `--export-vars-csv FILE`: export parsed variables with descriptions/units to CSV
- `-d, --dictionary`: print the parsed variable dictionary
- `-D, --debug`: print the current `.nxd` line being processed

Relevant for validation and value export
- `--validate`: run `punx` validation after writing `.nxs`
- `--export-values-csv FILE`: export dataset values from `.nxs`/HDF5 to CSV
- `--export-values-prefix PATH`: restrict exported datasets to a prefix (must start with `/`)
- `--csv-delimiter CHAR`: override CSV delimiter (default `,`)

You can also run the script directly if you haven’t installed the package:

```
python3 NeXusCreator.py -h
```

## Internal Links in .nxd and YAML

You can declare internal links in NeXus Description files to point to an existing
dataset elsewhere in the same file. Use the arrow syntax in `.nxd`:

```
scopeP: --> /entry/instrument/11_detector_chamber/scope/scopeP
```

When converting to HDF5, this becomes an internal soft link at `scopeP` pointing
to the target dataset.

YAML representation uses a `link` field for the same construct:

```
scopeP:
  link: /entry/instrument/11_detector_chamber/scope/scopeP
```

Both forms are supported by the readers and the HDF5 writer.

## External Links in .nxd and YAML

You can link to datasets in another NeXus/HDF5 file using the external link
syntax in `.nxd`:

```
calibration_2:  --> ../calibration/rixsCucalcold_R0001.nxs | /entry/
```

This creates an HDF5 external link named `calibration_2` pointing to the
group `/entry/` inside the target file.

YAML representation uses an `external` mapping with `file` and `path` keys:

```
calibration_2:
  external:
    file: ../calibration/rixsCucalcold_R0001.nxs
    path: /entry/
```

Both forms are supported by the YAML and .nxd readers and are written as
HDF5 external links during conversion.

## Plugins

NeXusCreator auto-discovers plugins under `Plugins/`:
- Definition generators build `.nxd` objects for the `-g` flow
- Data parsers create the variable library used for injection during conversion

Built-in plugins:
- SPEC (`plugins/spec_plugin.py`): generator + parser
- DTA/DAT (`plugins/dta_plugin.py`): generator + parser; supports RAW/temp/non-RAW and batteries folder
- HDF5/NeXus (`plugins/hdf5_plugin.py`): generator + parser; generate .nxd by linking datasets in existing .nxs/.h5
- TIF (`plugins/tiff_plugin.py`): generator + parser; reads `.tif/.tiff` and creates an image-centric definition

Write your own plugin by adding a module to `plugins/` that subclasses the base interfaces; no registration is required. See `docs/source/plugins.md` for details and examples.

You can also load external plugin files/directories without editing this repository:

```
export NEXUSCREATOR_PLUGIN_PATHS="/path/to/my_plugins:/path/to/custom_plugin.py"
```

Set `NEXUSCREATOR_PLUGIN_DEBUG=1` to print plugin import diagnostics.

### Notes on directory generation with `-g`
- Default for directory input is multi-file (one `.nxd` per supported input).
- If `-b peaxis` is set for a directory, the default switches to single-file (combined) generation.
- Use `--single-file` to produce one combined `.nxd` from a folder (batteries-style structure for DTA/DAT folders).
- `--single-file` and `--multi-file` are mutually exclusive. The CLI will error if both are passed.

Temperature data location:
- For both single `.dat` inputs and batteries folders, temperature datasets are placed under `/entry/experiments/temperature`.

Further reading: `docs/architecture.md`, `docs/usage.md`.

## Python API

Programmatic conversion using the package API:

```python
from nexuscreator import NeXusCreator, create_nexus

# One-shot helper
out_path = create_nexus(nexus_definition_file="def.nxd",
                        input_path="data.spec",
                        output_path="out.nxs",
                        beamline_name="ikft")

# Lower-level: execute using flags
NeXusCreator().execute_conversion({
    'nexus_definition_file': 'def.nxd',
    'input_path': 'data.spec',
    'output_path': 'out.nxs',
    'beamline_name': 'ikft',
})
```

### Advanced Usage

```python
from nexuscreator import NeXusCreator

creator = NeXusCreator()

# Batch processing multiple files
files_to_process = [
    {'input': 'data1.spec', 'output': 'out1.nxs'},
    {'input': 'data2.spec', 'output': 'out2.nxs'},
    {'input': 'data3.spec', 'output': 'out3.nxs'}
]

for file_info in files_to_process:
    creator.execute_conversion({
        'nexus_definition_file': 'template.nxd',
        'input_path': file_info['input'],
        'output_path': file_info['output'],
        'beamline_name': 'ikft'
    })
```

### Performance Tips

1. Reuse a prepared `.nxd` when converting many similar inputs.
2. Batch related files together to reduce setup overhead.
3. Use `--dry-run` first for large directory inputs to confirm file matching.
4. Enable `--validate` only when you need NXDL validation, since it adds work after writing.

## NeXus Definition Syntax

A `nexus_object` is a nested dictionary that defines the structure of a NeXus file. It uses specific keys and conventions to represent NeXus groups, datasets, attributes, and links.

### Basic Structure

A `nexus_object` typically starts with an `entry` group, which is the root of the NeXus file:

```python
nexus_object = {
    '@default': 'entry',
    'entry': {
        '@NX_class': 'NXentry',
        # Additional groups and datasets go here
    }
}
```

### Groups

Groups are represented as nested dictionaries. The `@NX_class` key specifies the NeXus class of the group:

```python
nexus_object = {
    'entry': {
        '@NX_class': 'NXentry',
        'instrument': {
            '@NX_class': 'NXinstrument',
            # Instrument-specific groups and datasets
        },
        'sample': {
            '@NX_class': 'NXsample',
            # Sample-specific groups and datasets
        }
    }
}
```

### Datasets

Datasets are represented as dictionaries with specific keys:
- `@dtype`: The data type (e.g., `NX_FLOAT64`, `NX_INT32`).
- `@value`: The value of the dataset. This can be a direct value or a placeholder.
- `@units`: The units of the dataset (optional).
- `@description`: A description of the dataset (optional).

```python
nexus_object = {
    'entry': {
        '@NX_class': 'NXentry',
        'temperature': {
            '@dtype': 'NX_FLOAT64',
            '@value': 298.15,
            '@units': 'K',
            '@description': 'Sample temperature'
        }
    }
}
```

### Placeholders

Placeholders are used to inject values from a library during conversion. The `@value` key specifies the placeholder name:

```python
nexus_object = {
    'entry': {
        '@NX_class': 'NXentry',
        'temperature': {
            '@dtype': 'NX_FLOAT64',
            '@value': 'temp_value',
            '@units': 'K'
        }
    }
}
```

### Links

Links allow you to reference datasets in the same file (internal links) or in another file (external links).

#### Internal Links

Internal links use the `link` key to point to another dataset in the same file:

```python
nexus_object = {
    'entry': {
        '@NX_class': 'NXentry',
        'data': {
            '@NX_class': 'NXdata',
            'signal': {
                '@dtype': 'NX_FLOAT64',
                '@value': [1.0, 2.0, 3.0]
            },
            'signal_link': {
                'link': '/entry/data/signal'
            }
        }
    }
}
```

#### External Links

External links use the `external` key to point to a dataset in another file:

```python
nexus_object = {
    'entry': {
        '@NX_class': 'NXentry',
        'calibration': {
            'external': {
                'file': 'calibration.nxs',
                'path': '/entry/calibration/data'
            }
        }
    }
}
```

### Attributes

Attributes are represented as keys starting with `@` in the dataset or group dictionary:

```python
nexus_object = {
    'entry': {
        '@NX_class': 'NXentry',
        '@default': 'data',
        'data': {
            '@NX_class': 'NXdata',
            '@signal': 'signal',
            '@axes': 'axes',
            'signal': {
                '@dtype': 'NX_FLOAT64',
                '@value': [1.0, 2.0, 3.0],
                '@units': 'counts',
                '@description': 'Measurement signal'
            }
        }
    }
}
```

### Creating a `nexus_object`

To create a `nexus_object`, you can either:
1. Write a `.nxd` file and use the `NeXusDefinition` library to read it.
2. Directly construct the dictionary in Python.

#### Example: Direct Construction

```python
nexus_object = {
    '@default': 'entry',
    'entry': {
        '@NX_class': 'NXentry',
        'instrument': {
            '@NX_class': 'NXinstrument',
            'detector': {
                '@NX_class': 'NXdetector',
                'data': {
                    '@dtype': 'NX_FLOAT64',
                    '@value': [1.0, 2.0, 3.0],
                    '@units': 'counts',
                    '@description': 'Detector counts'
                }
            }
        },
        'sample': {
            '@NX_class': 'NXsample',
            'temperature': {
                '@dtype': 'NX_FLOAT64',
                '@value': 'temp_placeholder',
                '@units': 'K',
                '@description': 'Sample temperature'
            }
        }
    }
}
```

#### Example: Using a `.nxd` File

Create a file named `example.nxd`:

```
@default: entry
entry: {
    @NX_class: NXentry
    instrument: {
        @NX_class: NXinstrument
        detector: {
            @NX_class: NXdetector
            data: {
                @dtype: NX_FLOAT64
                @value: [1.0, 2.0, 3.0]
                @units: counts
                @description: Detector counts
            }
        }
    }
    sample: {
        @NX_class: NXsample
        temperature: {
            @dtype: NX_FLOAT64
            @value: temp_placeholder
            @units: K
            @description: Sample temperature
        }
    }
}
```

Then read it using the `NeXusDefinition` library:

```python
from libraries.NeXusDefinition import NexusDefinitionReader

nexus_object = NexusDefinitionReader('example.nxd').read()
```

### Modifying a `nexus_object`

You can modify a `nexus_object` by directly manipulating the dictionary:

```python
# Add a new dataset
nexus_object['entry']['sample']['pressure'] = {
    '@dtype': 'NX_FLOAT64',
    '@value': 'pressure_placeholder',
    '@units': 'Pa',
    '@description': 'Sample pressure'
}

# Modify an existing dataset
nexus_object['entry']['sample']['temperature']['@value'] = 'new_temp_placeholder'

# Remove a dataset
del nexus_object['entry']['sample']['temperature']
```

### Writing a `nexus_object` to a NeXus File

Use the `NexusHDF5Writer` to write a `nexus_object` to a NeXus file:

```python
from libraries.NeXusHDF5 import NexusHDF5Writer

NexusHDF5Writer(nexus_object).write('output.nxs')
```

## How It Works

- `NeXusCreator.py` provides the CLI and routes to either definition generation or conversion.
- `NeXusCreatorClass.py` orchestrates parsing, template expansion, injection, and writing. It:
  - Selects a parser via the plugin manager (preferred) or built-ins
  - Supports per-scan outputs for SPEC with a generated master file (external links)
  - Expands scan templates in `.nxd` using placeholders like `scan{num}_` and `@scan_template`
- `Libraries/NeXusHDF5.py` contains `NexusValueInjector` and `NexusHDF5Writer` with **memory-efficient chunked writing**.
- `Generators/` and `Parsers/` include format-specific logic reused by plugins, with **streaming parsers** for large files.

See `docs/architecture.md` for a deeper dive into the data flow and components.

These optimizations make NeXusCreator suitable for processing large experimental datasets efficiently while maintaining data integrity and NeXus compliance.

## Testing

```
make install  # or: pip install -r requirements.txt
make test     # or: pytest -q
```

- Run a specific file: `pytest -q Tests/test_spec_parser_data_files.py`
- If a plugin causes a Qt/shiboken import error, disable autoload: `PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 pytest -q`

See `Docs/tests.md` for details on the test suite and troubleshooting.

## Development

- Linting and formatting checks:

```
make install-dev  # installs ruff + flake8
make lint         # runs ruff and flake8 (non-fatal)
make compile      # byte-compiles modules
```

## Troubleshooting

- `ModuleNotFoundError: No module named 'h5py'` — Install dependencies: `pip install h5py numpy`.
- “no suitable generator/parser plugin found” — Verify file extension and beamline flag; see Plugins docs.
- Empty output or missing datasets — Use `-d` to print the variable dictionary; check that `.nxd` placeholders match library keys.
- SPEC per-scan not splitting — Ensure `-f` is provided and scans exist in the parsed dictionary.
- Too many lines in `--dry-run` output — Add `--summary-only` to print only the final batch summary.
