Metadata-Version: 2.4
Name: vcti-fileloader
Version: 5.1.1
Summary: Plugin-based file loader framework that attaches locked subtrees into a LockableTree
Author: Visual Collaboration Technologies Inc.
License: Proprietary
Project-URL: Homepage, https://github.com/vcollab/vcti-python-fileloader
Project-URL: Repository, https://github.com/vcollab/vcti-python-fileloader
Project-URL: Issues, https://github.com/vcollab/vcti-python-fileloader/issues
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Typing :: Typed
Requires-Python: <3.15,>=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24
Requires-Dist: vcti-plugin-catalog>=1.0.0
Requires-Dist: vcti-tree>=1.0.0
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Provides-Extra: lint
Requires-Dist: ruff; extra == "lint"
Provides-Extra: typecheck
Requires-Dist: mypy; extra == "typecheck"
Dynamic: license-file

# vcti-fileloader

A protocol-based framework for loading file content into a shared
tree. Loaders attach a locked subtree under a caller-supplied parent
handle in any `LockableTree` backing — they do not own the tree.

**Install** with `pip install vcti-fileloader`; **import** from
`vcti.fileloader.core`. The `vcti.fileloader` package is a namespace
shared with the loader plugin packages (`vcti.fileloader.hdf5`,
`vcti.fileloader.json`, `vcti.fileloader.numpy`, …), each of which is
its own PyPI distribution.

This package is fully typed (`py.typed`) and safe for strict type
checkers (mypy `--strict`, pyright).

## Overview

Applications that work with simulation and CAE data need to load many
file formats — HDF5, VTK, OpenFOAM, JSON, CSV, proprietary binary, etc.
The data from each file is hierarchical (groups, datasets, attributes),
but the *trees* often have to combine: a workflow may load several
files into a single browseable structure.

`vcti-fileloader` defines a uniform protocol that every loader plugin
implements: `populate(handle, tree, parent)` builds the file's content
as a new subtree under `parent` and locks it before returning. Loaders
pass through the file's native attributes verbatim into a read-only
side of the node payload (`file_attributes`); a separate mutable side
(`enricher_attributes`) is reserved for post-load enrichment by the
caller. The framework codes against the `LockableTree` protocol from
[vcti-tree](https://github.com/vcollab/vcti-python-tree), so the
caller picks the backing — `DictTree` for simple cases, `ArrayTree`
(from `vcti-nptree`) for file-structure-scale workloads, or any other
conforming implementation.

```
┌─────────────────────────────────────────────────────┐
│                  Application Code                   │
│        (owns LockableTree, uses Loader protocol)    │
└──────────────┬──────────────────────┬───────────────┘
               │                      │
       ┌───────▼───────┐      ┌───────▼───────┐
       │  HDF5 Loader  │      │  JSON Loader  │  ...
       │  (plugin pkg) │      │  (plugin pkg) │
       └───────────────┘      └───────────────┘
                  attach subtrees into the tree
```

## Key Concepts

### Loader (Protocol)

Any class that implements these four methods plus `validator` and
`setup` attributes satisfies the protocol — no base class inheritance
required (PEP 544 structural subtyping):

| Method | Purpose |
|--------|---------|
| `can_load(path)` | Lightweight check — can this loader handle the file? |
| `load(path, **options)` | Open a file and return an opaque handle |
| `populate(handle, tree, parent, *, before_lock=None, **options)` | Build the file's subtree under `parent`; lock and return its handle |
| `unload(handle)` | Release file handles and memory (**idempotent**) |

`populate` is generic in the tree handle type `H`, so the same loader
works against any `LockableTree[DataNode, H]` backing.

Each loader also carries optional validator and setup hooks:

- `LoaderValidator.validate()` — returns `True` if all runtime
  dependencies (e.g., h5py) are available.
- `LoaderSetup.setup()` — configures paths, environment variables, or
  component versions before first use.

### `before_lock` hook

`populate` accepts an optional callback `(tree, subtree_root) -> None`
that fires inside the transaction *after* the loader attaches the
file's content and *before* the locks are applied. Use it for
attribute enrichment, computing derived payload state, or validation.
Any exception raised by the hook triggers rollback of the partial
subtree, so failures are atomic.

The hook does not know about, and is not coupled to, any particular
enrichment library — it is just a callable. The package
[vcti-attribute-enricher](https://github.com/vcollab/vcti-python-attribute-enricher)
provides one (rule-driven) enricher you can wire in via this hook;
your own callbacks work equally well.

### SubtreeBuilder

A transactional helper for implementing `populate`. It owns a single
subtree under a caller-supplied parent and guarantees:

- **Scope enforcement.** Writes are rejected if their `parent` is not
  inside this builder's subtree — loaders cannot accidentally mutate
  the rest of the tree.
- **Pre-commit hook.** A `before_commit` callable (the implementation
  side of the loader's `before_lock`) runs after content is built and
  before locks fire.
- **Commit-on-success.** Normal exit from the `with` block locks the
  subtree (structure + payload).
- **Rollback-on-failure.** An exception during the build *or* during
  the pre-commit hook removes the partial subtree before propagating.

### `DataNode` and `LazyDataNode`

Tree payloads. A `DataNode` carries four pieces of state:

| Field / property | Description |
|---|---|
| `data` | Primary payload — NumPy array, parsed dict, `None`, anything. |
| `name` | File-internal identifier (HDF5 basename, NPZ archive key). `None` when not applicable. |
| `file_attributes` | Read-only `Mapping` view of the file's native attributes (loader-set, verbatim). |
| `enricher_attributes` | Mutable dict where post-load enrichers (or `before_lock` hooks) write. |
| `attributes` | `ChainMap` merged view, enricher first. Read here for portable rules; writes go to `enricher_attributes`. |

A `LazyDataNode` adds an on-demand loader callback plus pre-load
`shape` and `dtype` fields, so consumers can filter or display a
dataset without materialising it.

### LoaderDescriptor and LoaderRegistry

`LoaderDescriptor` wraps a `Loader` instance with metadata — a unique
`id`, a human-readable `name`, and filterable `attributes` (typically
`{"supported_formats": ["hdf5-file"]}` pointing at descriptor IDs from
`vcti-path-format-descriptors`).

`LoaderRegistry` is a typed registry of `LoaderDescriptor` entries.
Register loaders at startup, then look them up by id or query by
attributes at runtime.

## Lifecycle Contracts

1. **Validate / setup** — call `validator.validate()` and
   `setup.setup()` once before the first `load()`.
2. **Check** — `can_load(path)` before `load()` to prevent
   `UnsupportedFormatError`.
3. **Load** — `loader.load(path)` opens the file, returns a handle.
4. **Populate** — `loader.populate(handle, tree, parent,
   before_lock=...)` grafts the file's subtree under `parent`,
   optionally runs the hook, then locks the subtree. Returns the
   subtree root handle.
5. **Unload** — `loader.unload(handle)` releases resources.
   Idempotent. If the loader attached `LazyDataNode`s, their closures
   may hold the handle — call `materialise_subtree(tree, root)` first
   if the tree must remain usable after unload.

## Installation

```bash
pip install vcti-fileloader>=5.1.1
```

```toml
dependencies = [
    "vcti-fileloader>=5.1.1",
]
```

## Quick Start

```python
from pathlib import Path

from vcti.tree import DictTree, descendants
from vcti.fileloader.core import DataNode, LoaderDescriptor, LoaderRegistry

# At startup
registry = LoaderRegistry()
registry.register(LoaderDescriptor(
    id="hdf5-h5py-loader",
    name="HDF5 Loader (h5py)",
    loader=my_h5py_loader,
    attributes={"supported_formats": ["hdf5-file"]},
))

# At runtime
desc = registry.get("hdf5-h5py-loader")
desc.loader.validator.validate()
desc.loader.setup.setup()

# Application owns the tree
tree: DictTree[DataNode] = DictTree(DataNode())

handle = desc.loader.load(Path("simulation.h5"))
try:
    subtree_root = desc.loader.populate(handle, tree, tree.root_handle)
    # subtree is structure-locked and payload-locked
    for h in descendants(tree, subtree_root):
        node = tree.payload(h)
        if node.name == "stress":
            ...
finally:
    desc.loader.unload(handle)
```

## Quick Start — with a `before_lock` hook

```python
from vcti.attribute_enricher import EnrichRule, apply_rules
from vcti.lookup import Rule

def enrich(tree, root):
    apply_rules(
        descendants(tree, root, include_self=True),
        rules=[
            EnrichRule(set={"file_path": str(path)}),
            EnrichRule(set={"category": "mechanical"},
                       when=(Rule("name", "^=", "stress"),)),
        ],
    )

handle = desc.loader.load(Path("simulation.h5"))
try:
    root = desc.loader.populate(handle, tree, tree.root_handle, before_lock=enrich)
finally:
    desc.loader.unload(handle)
```

`vcti-attribute-enricher` is an *optional* package — the framework
itself has no dependency on it. The `before_lock` argument accepts
any callable `(tree, root) -> None`; your own callback works just
as well.

## Error Handling

All exceptions inherit from `LoaderError`:

| Exception | When to raise / catch |
|-----------|----------------------|
| `LoaderError` | Base — catches any loader failure |
| `LoadError` | File cannot be opened or parsed |
| `UnloadError` | Resource cleanup failed |
| `UnsupportedFormatError` | Loader does not recognise the file format |
| `ValidationError` | `validator.validate()` detected missing dependencies |
| `SetupError` | `setup.setup()` could not configure the environment |
| `TreeAttachmentError` | `populate()` cannot attach: `parent` is missing, deleted, or structure-locked |

`TreeAttachmentError` translates the named tree exceptions from
`vcti-tree` (`HandleError`, `InactiveNodeError`, `StructureLockedError`)
into a single fileloader-domain failure type. The underlying tree
exception is preserved on `__cause__`.

## What this package does NOT do

- **No concrete loaders.** Actual file reading (HDF5, JSON, NPY, etc.)
  lives in separate loader plugin packages.
- **No tree implementation.** Backings come from `vcti-tree`
  (`DictTree`), `vcti-nptree` (`ArrayTree`), or third parties.
- **No attribute enrichment.** Enrichment is run via the optional
  `before_lock` hook by the caller, using whatever callable they like
  (e.g., `vcti-attribute-enricher`).
- **No data transformation.** Data is returned as-is from the loader.
- **No caching.** Caching strategies belong at the application level.

## Further Reading

- [Common Patterns](docs/patterns.md) — Loader implementation, the
  `SubtreeBuilder`, the `before_lock` hook, validator/setup patterns,
  error handling.
- [Design & Concepts](docs/design.md) — Architecture, protocol
  rationale, layered attribute model, locking model.

## Dependencies

- [numpy](https://numpy.org/) (>=1.24) — `DataNode.__eq__`
- [vcti-plugin-catalog](https://pypi.org/project/vcti-plugin-catalog/) (>=1.0.0) — Descriptor and Registry base classes
- [vcti-tree](https://pypi.org/project/vcti-tree/) (>=1.0.0) — `LockableTree` protocol, generic algorithms, named exceptions

## Versioning

This package follows [Semantic Versioning](https://semver.org/).
Breaking changes to the `Loader` protocol or `DataNode` shape will
only occur in major version bumps. Downstream loader plugins should
pin to a compatible major version (e.g., `vcti-fileloader>=5.0,<6`).
