Metadata-Version: 2.4
Name: taptoken
Version: 0.1.1
Summary: Time series tokenization and detokenization library using tapper patterns.
Author-email: Fedwin Chatelier <developer.fedwinc@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/fechad/TSTokenizer
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.24.0
Provides-Extra: viz
Requires-Dist: matplotlib>=3.7.0; extra == "viz"

# TSTokenizer

Time series tokenization library. Converts numeric sequences into discrete tokens composed of a **spatial pattern** (a tapper-pattern ID on a 5×5 grid) and a **magnitude vector** (absolute deltas, optionally log-scaled).

## Installation

```bash
pip install taptoken
```

**From source:**
```bash
git clone https://github.com/fechad/TSTokenizer.git
cd TSTokenizer
pip install -e .
```

## Quick start

```python
from taptoken import tokenize, detokenize

series = [10.0, 12.5, 11.0, 14.0, 13.5, 16.0, 15.0]

tokens = tokenize(series)
# → [(pattern_id, [bias, delta, delta, ...]), ...]

reconstructed = detokenize(tokens, expected_length=len(series))
```

## API

### `tokenize(values, vocabulary_type="base10", use_log_scale=True)`

| Parameter | Type | Description |
|---|---|---|
| `values` | `list[float]` | Input time series |
| `vocabulary_type` | `"base10"` \| `"base6"` | Key type for pattern IDs (`int` vs `str`) |
| `use_log_scale` | `bool` | Apply `log1p` to magnitude deltas |

Returns `list[tuple[pattern_id, list[float]]]`.

---

### `detokenize(tokens, vocabulary_type="base10", use_log_scale=True, overlap_size=1, expected_length=None)`

| Parameter | Type | Description |
|---|---|---|
| `tokens` | output of `tokenize` | Token list to reconstruct |
| `vocabulary_type` | `"base10"` \| `"base6"` | Must match the value used in `tokenize` |
| `use_log_scale` | `bool` | Must match the value used in `tokenize` |
| `overlap_size` | `int` | Overlap between chunks (default `1`, matches encoder) |
| `expected_length` | `int \| None` | Trim output to this length |

Returns `list[float]`.

## Vocabulary types

| Type | Pattern ID | Example key |
|---|---|---|
| `base10` | `int` | `42` |
| `base6` | `str` | `"01215"` |

Both vocabularies encode the same set of tapper patterns; only the key format differs.

## Development

```bash
pip install -e .
pytest tests/test_taptoken.py -v
```

**Publishing a new version** is handled by the [Publish to PyPI](../../actions/workflows/publish.yml) GitHub Actions workflow (`workflow_dispatch`). Select a version bump (`patch` / `minor` / `major` / `skip`) and the workflow bumps, builds, and uploads automatically. Requires a `PYPI_API_TOKEN` repository secret.
