Metadata-Version: 2.4
Name: vcti-attribute-enricher
Version: 1.0.0
Summary: Rule-based attribute enrichment for any sequence of items
Author-email: "Visual Collaboration Technologies Inc." <info@vcollab.com>
License: Copyright (c) 2018-2026 Visual Collaboration Technologies Inc.
        All Rights Reserved.
        
        This software is proprietary and confidential. Unauthorized copying,
        distribution, or use of this software, via any medium, is strictly
        prohibited. Access is granted only to authorized VCollab developers
        and individuals explicitly authorized by Visual Collaboration
        Technologies Inc.
        
Project-URL: Homepage, https://github.com/vcollab/vcti-python-attribute-enricher
Project-URL: Repository, https://github.com/vcollab/vcti-python-attribute-enricher
Project-URL: Changelog, https://github.com/vcollab/vcti-python-attribute-enricher/blob/main/CHANGELOG.md
Project-URL: Issues, https://github.com/vcollab/vcti-python-attribute-enricher/issues
Keywords: attributes,enrichment,rules,vcti,metadata
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: <3.15,>=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: vcti-lookup>=1.0.0
Requires-Dist: vcti-predicate>=1.0.0
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Provides-Extra: lint
Requires-Dist: ruff; extra == "lint"
Provides-Extra: typecheck
Requires-Dist: mypy; extra == "typecheck"
Dynamic: license-file

# Attribute Enricher

Rule-based attribute enrichment for any sequence of items.

## Overview

`vcti-attribute-enricher` adds attributes to items in a sequence by
matching `vcti-lookup` rules. Each `EnrichRule` pairs a condition
(`when`) with attributes to write (`set`); the enricher walks an
iterable, evaluates each rule against each item, and stamps matching
items with the rule's attributes.

The package is iterable-generic, not tree-specific: the items can be
tree-node payloads (e.g., from `vcti-fileloader`), dicts, dataclasses,
or any object with an attribute-bearing surface. Reading and writing
are pluggable via `getter` / `setter` callables, with sensible defaults
for `vcti-fileloader`'s `DataNode`-shaped payloads.

The package is the write-side counterpart to `vcti-lookup`: where
Lookup filters a sequence by rules, the enricher mutates the sequence
by rules.

## Installation

```bash
pip install vcti-attribute-enricher
```

### In `requirements.txt`

```
vcti-attribute-enricher>=1.0.0
```

### In `pyproject.toml` dependencies

```toml
dependencies = [
    "vcti-attribute-enricher>=1.0.0",
]
```

---

## Quick Start

```python
from vcti.attribute_enricher import EnrichRule, apply_rules
from vcti.lookup import MISSING, Rule

items = [
    {"name": "stress.h5",     "dtype": "float64", "attributes": {}, "enricher_attributes": {}},
    {"name": "ids.h5",        "dtype": "int64",   "attributes": {}, "enricher_attributes": {}},
    {"name": "config.json",   "dtype": "object",  "attributes": {}, "enricher_attributes": {}},
]

def getter(item, key):
    # Read from the item's "attributes" dict, then the top-level item.
    # Return MISSING (not None) when absent so the rule cleanly no-matches.
    if key in item["attributes"]:
        return item["attributes"][key]
    return item.get(key, MISSING)

apply_rules(
    items,
    rules=[
        EnrichRule(set={"loaded_at": "2026-06-06"}),                            # every item
        EnrichRule(set={"category": "mechanical"},
                   when=(Rule("name", "^=", "stress"),)),                        # name starts with "stress"
        EnrichRule(set={"is_numeric": True},
                   when=(Rule("dtype", "^=", "float"),
                         Rule("dtype", "!=", "object"))),                        # AND across rules
    ],
    getter=getter,
    setter=lambda item, key, value: item["enricher_attributes"].__setitem__(key, value),
)
```

For payloads with a single mutable attribute dict (the common case):

```python
items = [{"name": "x", "tags": {}}, {"name": "y", "tags": {}}]

apply_rules(
    items,
    rules=[EnrichRule(set={"seen": True})],
    getter=lambda item, key: item.get(key, MISSING),
    setter=lambda item, key, value: item["tags"].__setitem__(key, value),
)
```

> Getters must return `vcti.lookup.MISSING` for absent attributes, not
> `None`. `None` is a legal value that gets passed to the operator;
> `MISSING` short-circuits the rule to "no match".

## Enriching tree-node payloads

The package's default getter/setter are tuned for `vcti-fileloader`'s
`DataNode`/`LazyDataNode` payloads, which carry both a read-only
`file_attributes` mapping (file-native) and a mutable
`enricher_attributes` dict. The merged read view is exposed via
`.attributes` (a `ChainMap`).

```python
from vcti.attribute_enricher import EnrichRule, apply_rules
from vcti.tree import descendants
from vcti.lookup import Rule

apply_rules(
    descendants(tree, subtree_root, include_self=True),
    rules=[
        EnrichRule(set={"file_path": str(path)}),
        EnrichRule(set={"category": "mechanical"},
                   when=(Rule("name", "^=", "stress"),)),
    ],
)
```

No `getter` / `setter` arguments needed — the defaults read
`item.attributes` (ChainMap) and write to `item.enricher_attributes`.

---

## How it works

1. **Iterate** the supplied items.
2. For each item, evaluate every `EnrichRule`:
   - If `when` is empty, the rule matches.
   - If `when` has one or more `Rule`s, every rule must match (AND).
   - A rule matches by reading the relevant attribute through `getter`
     and evaluating it via `vcti.predicate.evaluate` (forwarded through
     `vcti-lookup`).
3. On a match, write each key/value in `set` to the item via `setter`.

**Layering — last write wins.** Multiple `EnrichRule`s that match the
same item all apply, in the order given. Later rules overwrite earlier
ones on collision. This makes it easy to express "set a default for
everything, then refine for specific subsets."

**Return value.** `apply_rules` returns an `EnrichResult` with summary
counts — handy for logging and for spotting dead rules:

```python
result = apply_rules(items, rules)
print(result.items_visited, result.items_matched, result.writes_applied)
print(result.per_rule_matches)   # match count per rule, in order; 0 = dead rule
```

The items are mutated in place; the result carries metrics only.

---

## API

| Symbol | Description |
|---|---|
| `EnrichRule(set, when=())` | Frozen dataclass. `set` is a dict of attributes to write; `when` is a `tuple[Rule, ...]` combined with AND logic. Empty `when` matches every item. Not hashable (the `set` dict). |
| `apply_rules(items, rules, *, getter=None, setter=None)` | Walk `items`, evaluate `rules`, stamp matches. Returns an `EnrichResult`. Defaults to `vcti.lookup.attributes_getter` and a setter that writes to `item.enricher_attributes`. Mutates in place (not thread-safe); raises `ValueError` with rule context if evaluation fails (fail-fast, non-transactional). |
| `EnrichResult(items_visited, items_matched, writes_applied, per_rule_matches)` | Frozen dataclass of summary counts returned by `apply_rules`. `per_rule_matches` is per-rule, in order (a `0` flags a dead rule). |
| `Getter` / `Setter` | Type aliases for the `(item, key) -> value` and `(item, key, value) -> None` callables. |

Values in `set` are written by reference — a mutable value shared across
matched items is the same object on each. See [docs/patterns.md](docs/patterns.md)
for pitfalls.

---

## Dependencies

- [vcti-lookup](https://pypi.org/project/vcti-lookup/) (>=1.0.0) — `Rule`, `MISSING`, default `attributes_getter`
- [vcti-predicate](https://pypi.org/project/vcti-predicate/) (>=1.0.0) — `evaluate()` is called directly for rule matching
