Metadata-Version: 2.4
Name: vyyhti
Version: 2026.6.21
Summary: Scan and process embedded processing instructions in Text documents.
Author-email: Stefan Hagen <stefan@hagen.link>
Maintainer-email: Stefan Hagen <stefan@hagen.link>
License-Expression: MIT
Project-URL: Documentation, https://codes.dilettant.life/docs/vyyhti
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pyyaml>=6.0.3
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Dynamic: license-file

# vyyhti

Tangle (Finnish: vyyhti) — scan and process embedded processing instructions in text documents.

Requires Python 3.11 or later.

## Install

```
pip install vyyhti
```

## Quickstart

Run `vyyhti scan` with a text file to list every detected embedding:

```
$ vyyhti scan doc.md
line_command:1:0: '\\newpage'
inline_code:9:22: '`$.document.notes[*].group_ids[*]`'
line_command:11:0: '\\columns=10%,,%30%'
block_pi:21:0: '```{.text} <!--json-path-list-->'
```

Print the version:

```
$ vyyhti version
vyyhti 2026.6.21
$ vyyhti -V
vyyhti 2026.6.21
```

For a dedicated feature walkthrough you can follow in minutes visit [quickstart](quickstart/README.md).
A step-by-step build of a JSONPath expression linter for Markdown documents is provided in the [tutorial](tutorial/README.md).

## Embedding kinds

### Inline code

A single-backtick span on any non-fenced line:

```markdown
The expression `$.notes[*].id` must yield at least one result.
```

### Line command

A line whose only content is a backslash command (as used by liitos and LaTeX preprocessors):

```markdown
\newpage
\columns=10%,,30%
```

### Block processing instruction

A fenced code block whose info string contains an HTML processing instruction comment:

````markdown
```{.text} <!--json-path-list-->
$.document.notes[*].group_ids[*]
$.vulnerabilities[*].flags[*].group_ids[*]
```
````

The PI name (`json-path-list` above) is extracted and available to handlers.
The fence body is preserved verbatim.

## Library API

### `scan(text, config=None)`

Scans a text string and returns a list of `Location` objects — one per detected embedding.

```python
from vyyhti import scan, LocationKind

locs = scan(open('doc.md').read())
for loc in locs:
    print(loc.kind, loc.line, loc.text)
```

Each `Location` is a frozen dataclass:

| Field  | Type           | Meaning                                                         |
|:-------|:---------------|:----------------------------------------------------------------|
| `kind` | `LocationKind` | `INLINE_CODE`, `LINE_COMMAND`, or `BLOCK_PI`                    |
| `line` | `int`          | 1-based line number of the embedding start                      |
| `col`  | `int`          | 1-based column for `INLINE_CODE`; 0 for whole-line kinds        |
| `text` | `str`          | Raw matched text (backtick span / command / opening fence line) |
| `body` | `str \ None`   | For `BLOCK_PI`: fence body; `None` otherwise                    |
| `pi`   | `str \ None`   | For `BLOCK_PI`: content of `<!--…-->`; `None` otherwise         |

### `run(text, handlers, stages=None, config=None)`

Scans the text and applies a list of handlers through the processing pipeline.
Returns `(embeddings, findings)`.

```python
from vyyhti import run, Handler, LocationKind

class MyHandler(Handler):
    name = 'my-handler'

    def identify(self, location):
        return location.kind == LocationKind.INLINE_CODE

    def parse(self, location):
        return location.text[1:-1]   # strip backticks

    def verify(self, location, payload):
        return [] if payload.startswith('$') else ['not a JSONPath']

embeddings, findings = run(text, [MyHandler()])
```

Each matched `Location` becomes an `Embedding` (with `handler` name and `payload` set).
Each problem becomes a `Finding` (with `stage`, `message`, and `level`).

Pass `stages={'parse', 'verify'}` to limit which pipeline stages execute.
Pass a `ScannerConfig` to configure which embedding kinds are scanned.

### `ScannerConfig`

Controls which embedding kinds are active and overrides the default match patterns.
Pass an instance to `scan()` or `run()`.

```python
from vyyhti import scan, ScannerConfig

cfg = ScannerConfig(line_command=False)   # skip \commands
locs = scan(text, cfg)
```

| Field                  | Type   | Default          | Meaning                                   |
|:-----------------------|:-------|:-----------------|:------------------------------------------|
| `inline_code`          | `bool` | `True`           | Scan for inline-code spans                |
| `line_command`         | `bool` | `True`           | Scan for backslash line commands          |
| `block_pi`             | `bool` | `True`           | Scan for fenced-block PIs                 |
| `block_pi_pi_pattern`  | `str`  | `<!--(.*?)-->`   | Regex for PI comment in fence info string |
| `line_command_pattern` | `str`  | `^\s*(\\...)$`   | Regex for line commands                   |

Load from a YAML file (kebab-case keys) with `load_scanner_config(path)` from `vyyhti._config`.

### `Handler`

An abstract base class.  Subclass it and implement the methods you need.

| Method     | Signature                          | Required | Default                 |
|:-----------|:-----------------------------------|:---------|:------------------------|
| `name`     | `str` class attribute              | yes      | —                       |
| `identify` | `(location) -> bool`               | yes      | —                       |
| `parse`    | `(location) -> Any`                | no       | returns `location.text` |
| `verify`   | `(location, payload) -> list[str]` | no       | returns `[]`            |
| `validate` | `(location, payload) -> list[str]` | no       | returns `[]`            |
| `process`  | `(location, payload) -> Any`       | no       | returns `None`          |

### `Finding`

| Field      | Type       | Meaning                                          |
|:-----------|:-----------|:-------------------------------------------------|
| `location` | `Location` | The embedding where the problem was found        |
| `stage`    | `str`      | `'parse'`, `'verify'`, `'validate'`, `'process'` |
| `message`  | `str`      | Human-readable description                       |
| `level`    | `str`      | `'error'` (default), `'warning'`, or `'info'`    |

## Design

Handlers live in the tools that use vyyhti, not in the library itself.
`scan()` finds all structural embedding locations; each handler decides what it claims via `identify()`.
The pipeline then applies only the stages the handler cares about.
Handlers for different embedding kinds can coexist in the same `run()` call.

## Exit codes

- `0` — success.
- `1` — error: file not found, unreadable, or missing required argument.

## See also

`man vyyhti`

## Changes

See `docs/changes.md` for the release history.

## Coverage

The test suite maintains 99% branch coverage.
The HTML report (if generated) is in `site/coverage/`.

## SBOM

Runtime dependency information is published in `docs/sbom/` in SPDX 3.0 (JSON-LD) and CycloneDX 1.6 (JSON) formats.
See `docs/sbom/README.md` for the component inventory and validation guide.

