Metadata-Version: 2.4
Name: vcti-data-scope
Version: 1.0.0
Summary: Data-scope abstraction for grouping related data sources under one managed lifecycle, with format-aware loader resolution.
Author: Visual Collaboration Technologies Inc.
Requires-Python: <3.15,>=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: vcti-fileloader>=1.0.0
Requires-Dist: vcti-lookup>=1.0.0
Requires-Dist: vcti-logging>=1.0.0
Requires-Dist: vcti-error>=1.0.0
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Provides-Extra: lint
Requires-Dist: ruff; extra == "lint"
Provides-Extra: typecheck
Requires-Dist: mypy; extra == "typecheck"
Dynamic: license-file

# Data Scope

Data-scope abstraction for grouping related data sources under one
managed lifecycle, with format-aware loader resolution.

## Overview

`vcti-data-scope` provides a small framework for tying related data sources
together — files, folders, and (in future) other source kinds — under one
named, lifecycle-managed object. Each source is added with an explicit
format identifier; the scope resolves a loader for it via a
`LoaderRegistry` (from `vcti-fileloader`), and the user opens and closes the
whole collection together.

The framework is **pluggable**: a `DataScope` is the base type, with one
concrete subclass shipping today (`PathsGroup` for file and folder sources)
and additional types possible later (positional file arrays, parameter
sweeps, streaming sources, etc.). v1 is **read-only after open** and uses a
**strict load** policy for required sources; optional sources can fail
without aborting the scope.

## Installation

```bash
pip install vcti-data-scope>=1.0.0
```

### In `requirements.txt`

```
vcti-data-scope>=1.0.0
```

### In `pyproject.toml` dependencies

```toml
dependencies = [
    "vcti-data-scope>=1.0.0",
]
```

---

## Preparing a registry

`PathsGroup` resolves loaders from a `LoaderRegistry` (from
`vcti-fileloader`). The registry must be populated with descriptors before
the scope is used. Each descriptor's `attributes["supported_formats"]`
list is what the scope matches against the `format_id` you pass when
adding a source.

```python
from vcti.fileloader import LoaderRegistry, LoaderDescriptor

registry = LoaderRegistry()
registry.register(
    LoaderDescriptor(
        id="hdf5-h5py",
        name="HDF5 (h5py)",
        loader=H5pyLoader(),  # your loader implementing the Loader protocol
        attributes={"supported_formats": ["hdf5-file"]},
    )
)
# ...register one descriptor per loader you need
```

Most callers wrap this setup in a helper module so application code
receives a ready-to-use registry.

---

## Quick Start (context manager)

When scope usage is confined to a single block, the context manager is
the most concise form. The scope closes automatically on exit, even if
an exception is raised inside the block.

```python
from pathlib import Path

from vcti.datascope import PathsGroup

with PathsGroup("brake-squeal", registry=registry) as scope:
    scope.add_path_source(
        name="solver_input",
        path=Path("model.inp"),
        format_id="abaqus-inp",
    )
    scope.add_path_source(
        name="solver_output",
        path=Path("sol103.h5"),
        format_id="hdf5-file",
    )
    scope.add_path_source(
        name="solver_log",
        path=Path("run.log"),
        format_id="text-log",
        required=False,
    )

    scope.load()
    assert scope.is_valid

    # Reach into per-source loaders for typed access:
    h5_loader = scope.sources["solver_output"].loader
    # ... use h5_loader's typed API ...
```

---

## Usage without the context manager

When the scope's lifetime spans function boundaries — for example, a
scope owned by a long-lived service, an interactive session, or a class
attribute — open and close it explicitly. The contract is the same as
the context-manager form; only the syntax differs.

### Plain open / close

```python
from pathlib import Path

from vcti.datascope import PathsGroup

scope = PathsGroup("brake-squeal", registry=registry)
scope.add_path_source("solver_input", Path("model.inp"), format_id="abaqus-inp")
scope.add_path_source("solver_output", Path("sol103.h5"), format_id="hdf5-file")

if not scope.is_valid:
    raise RuntimeError("scope not loadable — some required source is unavailable")
scope.load()
try:
    # ... use scope.sources["..."].loader ...
    ...
finally:
    scope.close()
```

`scope.close()` is **idempotent** and **best-effort**: it walks every
source, closes the ones that are loaded, and logs (rather than raises) on
per-source close failures. It is always safe to call — including before
`load()` and after a failed `load()`.

### As an attribute of a long-lived object

```python
class AnalysisSession:
    def __init__(self, registry):
        self._scope = PathsGroup("session", registry=registry)

    def open(self, model_path, output_path):
        self._scope.add_path_source("input",  model_path, format_id="abaqus-inp")
        self._scope.add_path_source("output", output_path, format_id="hdf5-file")
        self._scope.load()

    def close(self):
        self._scope.close()

    @property
    def output_loader(self):
        return self._scope.sources["output"].loader
```

### Reopening after close

After `close()`, the scope may be reopened. Optional sources that
failed previously have their `last_error` cleared and are retried on the
next `load()`. Sources cannot be added or removed while the scope is
open (`DataScopeStateError`); add or remove before calling `load()` again.

```python
scope.load()
# ... use ...
scope.close()

# ... later ...
scope.load()   # re-opens; failed optionals get another chance
```

---

## Working with optional sources

Sources added with `required=False` do not abort `load()` on failure;
their failure is recorded and the scope continues:

```python
scope.load()

if not scope.is_valid:
    raise RuntimeError("scope is not in a usable state")

for src in scope.failed_optional_sources.values():
    log.warning("optional source %r unavailable: %s", src.name, src.last_error)
```

`is_valid` is a **pre-flight** check (`scope.is_valid`, no parens):
"could this scope be loaded right now?" Specifically:

- Empty scope (no sources) — **invalid**.
- Every required source's own `is_valid` is `True` — scope is valid.
- A loaded scope short-circuits to `True` without re-checking — `load()`
  would have raised on any required failure, so reaching `is_loaded`
  already proves validity. While unloaded, the check is re-run on every
  call (no caching), so a moved or deleted file is detected immediately.

`is_loaded` answers a different question — "has `load()` actually
completed?" Use `is_valid` before opening to confirm readiness; use
`is_loaded` after opening to confirm the lifecycle finished.

---

## Disambiguating between loaders that share a format

When several registered loaders declare the same `format_id` (e.g. two
HDF5 readers for different solvers), pass `extra_rules` to narrow the
selection. Rules are `vcti.lookup.Rule` instances applied alongside the
implicit `supported_formats contains <format_id>` rule:

```python
from vcti.lookup import Rule

scope.add_path_source(
    name="solver_output",
    path=Path("sol103.h5"),
    format_id="hdf5-file",
    extra_rules=[Rule("solver", "==", "nastran")],
)
```

If no descriptor matches, `add_path_source` raises `ValueError` at the
point of registration — not later at `load()` time.

---

See [docs/design.md](docs/design.md) for the conceptual model and
[docs/api.md](docs/api.md) for the API reference.

---

## Dependencies

- [vcti-fileloader](https://pypi.org/project/vcti-fileloader/) (>=1.0.0) — Loader, LoaderRegistry, LoaderDescriptor
- [vcti-lookup](https://pypi.org/project/vcti-lookup/) (>=1.0.0) — Rule (format-based loader filtering)
- [vcti-logging](https://pypi.org/project/vcti-logging/) (>=1.0.0) — logger
- [vcti-error](https://pypi.org/project/vcti-error/) (>=1.0.0) — error codes
