Metadata-Version: 2.4
Name: vcti-lookup
Version: 1.0.1
Summary: Attribute-based item lookup and filtering for Python collections
Author: Visual Collaboration Technologies Inc.
Requires-Python: <3.15,>=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: vcti-predicate>=1.0.7
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Provides-Extra: lint
Requires-Dist: ruff; extra == "lint"
Provides-Extra: numpy
Requires-Dist: numpy>=1.24; extra == "numpy"
Dynamic: license-file

# Lookup

## Purpose

`vcti-lookup` provides `Lookup`, an attribute-based filtering engine
for Python sequences.  It works at two levels:

**Level 1 — Filtering.**  You have a sequence of items, each with an
ID and attributes.  Lookup lets you filter by rules and get matching
items back.  The items can be dicts, dataclasses, Pydantic models,
numpy structured-array rows, or any object — pluggable getters
control how attributes are extracted.

**Level 2 — Data mapping.**  You have external data (a list of file
paths, a numpy array, a tree of nodes) and a separate set of
attributes describing each item.  You build an attribute set where
`id = index`, filter with Lookup, and use the matching IDs to index
back into your data.  Lookup doesn't wrap or copy your data — you
keep full control.

```
Level 1 — Filter directly          Level 2 — Map to external data

┌──────────────┐                   ┌───────────────┐
│ items        │                   │ your data     │
│ (id + attrs) │                   │ (list, array) │
└──────┬───────┘                   └───────▲───────┘
       │                                   │ indices
       ▼                                   │
┌──────────────┐  Rules    ┌──────────────┐  Rules
│    Lookup    │◄──────    │    Lookup    │◄──────
└──────┬───────┘           └──────┬───────┘
       │                          │
       ▼                          ▼
  matched items            matching_ids()
```

---

## Installation

### From GitHub (recommended for development)

```bash
# Latest main branch
pip install vcti-lookup


### In `requirements.txt`

```
vcti-lookup>=1.0.1
```

### In `pyproject.toml` dependencies

```toml
dependencies = [
    "vcti-lookup>=1.0.1",
]
```

---

## Quick Start

### Level 1 — Filter items directly

```python
from vcti.lookup import Lookup, Rule

items = [
    {"id": 1, "type": "pdf", "size": 1000},
    {"id": 2, "type": "txt", "size": 200},
    {"id": 3, "type": "pdf", "size": 1500},
]

lk = Lookup(items)

# Single rule
pdfs = lk.filter([Rule("type", "==", "pdf")])
# [{"id": 1, ...}, {"id": 3, ...}]

# Multiple rules (AND)
large_pdfs = lk.filter([Rule("type", "==", "pdf"), Rule("size", ">", 1200)])
# [{"id": 3, ...}]

# OR logic
result = lk.filter_any([Rule("type", "==", "pdf"), Rule("type", "==", "jpg")])

# Exclusion (complement of filter)
non_pdfs = lk.exclude([Rule("type", "==", "pdf")])

# Lookup by ID
item = lk.get(2)

# Unary operators
lk.filter([Rule("name", "is_empty")])
```

### Level 2 — Map attributes to external data

```python
from vcti.lookup import Lookup, Rule

# Your data — any shape, Lookup doesn't touch it
data = ["report.pdf", "notes.txt", "analysis.pdf"]

# Build attribute set — id = index into your data
attrs = [
    {"id": 0, "type": "pdf", "size": 1000},
    {"id": 1, "type": "txt", "size": 200},
    {"id": 2, "type": "pdf", "size": 1500},
]

lk = Lookup(attrs)
indices = lk.matching_ids([Rule("type", "==", "pdf")])
# [0, 2]

# Map back to your data
results = [data[i] for i in indices]
# ["report.pdf", "analysis.pdf"]
```

### Numpy structured arrays (direct, no copy)

```python
import numpy as np
from vcti.lookup import Lookup, Rule
from vcti.lookup.getter import numpy_getter

dt = np.dtype([("id", "i4"), ("type", "U10"), ("value", "f8")])
arr = np.array([(0, "sensor", 3.14), (1, "actuator", 2.71)], dtype=dt)

# Pass the array directly — no conversion
lk = Lookup(arr, getter=numpy_getter, id_key="id")
indices = lk.matching_ids([Rule("type", "==", "sensor")])
# [0]

# Index back into the array
results = arr[indices]
```

### Dataclasses

```python
from dataclasses import dataclass

@dataclass
class File:
    id: int
    type: str
    size: int

files = [File(1, "pdf", 1000), File(2, "txt", 200), File(3, "pdf", 1500)]
lk = Lookup(files)
result = lk.filter([Rule("type", "==", "pdf")])
# [File(1, "pdf", 1000), File(3, "pdf", 1500)]
```

### Modifiers

Modifiers are keyword arguments forwarded to vcti-predicate.  They
control comparison behaviour.  Pass them as the fourth argument to
``Rule``.

```python
items = [{"id": 1, "name": "Report_Q1"}, {"id": 2, "name": "report_Q2"}]
lk = Lookup(items)

# Default: case-insensitive (both match)
lk.filter([Rule("name", "^=", "report")])

# Case-sensitive via modifier (only item 1)
lk.filter([Rule("name", "^=", "Report", {"case_sensitive": True})])

# Float tolerance
lk.filter([Rule("value", "==", 3.14, {"tolerance": 0.01})])

# Regex with multiline
lk.filter([Rule("content", "~=", r"^error:", {"multiline": True})])
```

See [patterns.md](docs/patterns.md) for more modifier examples.

---

## Public API

| Name | Type | Description |
|------|------|-------------|
| `Lookup[T]` | Class | Attribute-based filtering for any sequence |
| `Rule` | Frozen dataclass | Filter condition (attribute, operator, value, modifiers) |
| `ValueGetter` | Type alias | `Callable[[Any, str], Any]` |
| `MISSING` | Sentinel | Returned by getters when attribute is absent |
| `auto_getter` | Function | Handles dicts and objects transparently |
| `dict_getter` | Function | Extracts values from dicts |
| `object_getter` | Function | Extracts values via `getattr` |
| `attributes_getter` | Function | Extracts values from `.attributes` dict |
| `numpy_getter` | Function | Extracts fields from numpy structured-array rows |

### Lookup methods

| Method | Returns | Description |
|--------|---------|-------------|
| `filter(rules)` | `list[T]` | Items matching ALL rules (AND) |
| `filter_any(rules)` | `list[T]` | Items matching ANY rule (OR) |
| `exclude(rules)` | `list[T]` | Items NOT matching all rules |
| `first(rules)` | `T \| None` | First matching item (short-circuits) |
| `first_any(rules)` | `T \| None` | First item matching any rule (short-circuits) |
| `first_id(rules)` | `Any \| None` | ID of first matching item |
| `matching_ids(rules)` | `list[Any]` | IDs of all matching items |
| `count(rules)` | `int` | Number of matching items (no list allocation) |
| `get(item_id)` | `T \| None` | Item by ID (O(1)) |
| `items` | `Sequence[T]` | The underlying sequence (same reference) |
| `lk[i]` / `lk[i:j]` | `T` / `Sequence[T]` | Index or slice access |

---

## Dependencies

- [vcti-predicate](https://github.com/vcollab/vcti-python-predicate) — condition evaluation engine
- [numpy](https://numpy.org/) — optional, for `numpy_getter`

---

## Documentation

- [Design](docs/design.md) — Concepts, architecture, and design decisions
- [Patterns](docs/patterns.md) — Real-world recipes and usage patterns
- [Source Guide](docs/source-guide.md) — Implementation walkthrough
- [Extending](docs/extending.md) — Creating custom value getters
- [API Reference](docs/api.md) — Autodoc for all modules
