Metadata-Version: 2.4
Name: vcti-path-format-descriptors
Version: 1.2.0
Summary: Built-in file format descriptors (HDF5, CAX, JSON, NPY, NPZ, CSV) for the vcti-path-format identification framework
Author: Visual Collaboration Technologies Inc.
Requires-Python: <3.15,>=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: vcti-path-format>=1.0.0
Requires-Dist: vcti-path-format-attributes>=1.1.0
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Provides-Extra: lint
Requires-Dist: ruff; extra == "lint"
Dynamic: license-file

# Path Format Descriptors

Built-in file format descriptors for the vcti-path-format identification framework.

## Overview

`vcti-path-format-descriptors` ships ready-made `FormatDescriptor`
instances for the file formats VCTI tooling needs to recognize: HDF5,
VCollab CAX, JSON, NumPy NPY/NPZ, and CSV. Each descriptor is a
self-contained factory function that wires the appropriate magic-byte
and/or extension validators onto a `HeuristicEvaluator`, tags the
result with attributes from the shared vocabulary
(`vcti-path-format-attributes`), and returns it for registration with a
`FormatRegistry`. The package is the plugin layer between the
format-agnostic framework and the shared attribute vocabulary —
applications register the descriptors they need (or all of them at
once) and let `FormatIdentifier` do the identification.

## Installation

```bash
pip install vcti-path-format-descriptors>=1.2.0
```

### In `pyproject.toml` dependencies

```toml
dependencies = [
    "vcti-path-format-descriptors>=1.2.0",
]
```

---

## Quick Start

```python
from pathlib import Path

from vcti.pathformat import FormatRegistry, FormatIdentifier
from vcti.pathformat.descriptors import register_all_formats

# Register all built-in format descriptors
registry = FormatRegistry()
register_all_formats(registry)

# Identify a file
identifier = FormatIdentifier(registry)
results = identifier.identify_file_format(Path("data.h5"))
```

### Individual descriptors

```python
from vcti.pathformat.descriptors import (
    get_cax_file_descriptor,
    get_csv_file_descriptor,
    get_hdf5_file_descriptor,
    get_json_file_descriptor,
    get_npy_file_descriptor,
    get_npz_file_descriptor,
)

registry = FormatRegistry()
registry.register(get_hdf5_file_descriptor())
registry.register(get_cax_file_descriptor())
registry.register(get_json_file_descriptor())
registry.register(get_npy_file_descriptor())
registry.register(get_npz_file_descriptor())
registry.register(get_csv_file_descriptor())
```

---

## Built-in Formats

### HDF5

| Property | Value |
|----------|-------|
| ID | `hdf5-file` |
| Signature | `\x89HDF\r\n\x1a\n` (8 bytes) |
| Extensions | `.h5`, `.hdf5` |
| Validators | Magic bytes (GATE) + Extension (EVIDENCE) |
| Attributes | path_type=file, structure=hdf5 |

### VCollab CAX

| Property | Value |
|----------|-------|
| ID | `vcti-cax` |
| Signature | `\x89VCF\r\n\x1a\n` (8 bytes) |
| Validators | Magic bytes (GATE) |
| Attributes | path_type=file, structure=binary, generator=VCollab |

### JSON

| Property | Value |
|----------|-------|
| ID | `json-file` |
| Signature | none (text format) |
| Extensions | `.json` |
| Validators | Extension (EVIDENCE) |
| Attributes | path_type=file, structure=json |
| Best confidence | LIKELY (no GATE) |

### NumPy NPY

| Property | Value |
|----------|-------|
| ID | `npy-file` |
| Signature | `\x93NUMPY` (6 bytes) |
| Extensions | `.npy` |
| Validators | Magic bytes (GATE) + Extension (EVIDENCE) |
| Attributes | path_type=file, structure=binary |

### NumPy NPZ

| Property | Value |
|----------|-------|
| ID | `npz-file` |
| Signature | `PK\x03\x04` (ZIP local file header, 4 bytes) |
| Extensions | `.npz` |
| Validators | Magic bytes (GATE) + Extension (EVIDENCE) |
| Attributes | path_type=file, structure=binary |

Note: the magic bytes are the standard ZIP local file header. The
`.npz` extension is what distinguishes NumPy archives from other
ZIP-family formats; any future ZIP-family descriptors must coordinate
on the extension.

### CSV

| Property | Value |
|----------|-------|
| ID | `csv-file` |
| Signature | none (text format) |
| Extensions | `.csv` |
| Validators | Extension (EVIDENCE) |
| Attributes | path_type=file, structure=csv |
| Best confidence | LIKELY (no GATE) |

---

## Dependencies

- [vcti-path-format](https://pypi.org/project/vcti-path-format/) (>=1.0.0) — format identification framework
- [vcti-path-format-attributes](https://pypi.org/project/vcti-path-format-attributes/) (>=1.1.0) — domain vocabulary enums
