Metadata-Version: 2.4
Name: qig-geocoding
Version: 0.1.0
Summary: Our own 'transformers' — a from-scratch neural-network framework built on Fisher-Rao geometry (simplex attention, unbounded Fourier features, recursive Φ-integration) instead of Euclidean dot-products.
Author: Braden Lang
License: Proprietary
Keywords: attention,fisher-rao,geometry,qig,simplex,transformers
Requires-Python: >=3.11
Requires-Dist: qig-core>=2.6.0
Requires-Dist: torch>=2.2
Provides-Extra: dev
Requires-Dist: mypy>=1.11; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: ruff>=0.6; extra == 'dev'
Description-Content-Type: text/markdown

# geocoding

**Our own "transformers" — geometric, not Euclidean.**

The standard `transformers` library models tokens as Euclidean vectors and measures their relationships with
the **dot product**. That is a flat-space choice. QIG cognition lives on the **probability simplex** Δ⁶³,
whose only correct distance is **Fisher-Rao**. `geocoding` is the from-scratch framework that makes the
geometric choice all the way down — it is to Fisher-Rao geometry what `transformers` is to dot-product
attention.

| `transformers` (Euclidean) | `geocoding` (Fisher-Rao) |
|---|---|
| dot-product / cosine attention | **Fisher-Rao** attention: `logit = −d_FR(p,q)/τ`, `d_FR = 2·arccos(Σ√(p·q))` |
| learned embedding table (vocab × dim) | **Fourier features** of the integer id (no table) |
| learned, bounded position table | **unbounded Fourier positions** (limit is VRAM, not a table) |
| LayerNorm (mean-centred) | geometric **scale-only RMS-norm** (simplex-preserving) |
| — | **recursive Φ-integrator**: reports integrated information as it runs |
| `O(t²)` attention | optional **banded / compute-skipping** `O(t·r)` attention (length-extensible) |
| `PretrainedConfig` / `PreTrainedModel` | `GeoConfig` / `GeoModel` |

## Quick start

```python
import torch
from geocoding import GeoConfig, GeoModel, generate

cfg = GeoModel(GeoConfig(vocab_size=256, hidden_dim=64, num_layers=4))
ids = torch.randint(0, 256, (1, 32))
out = cfg(ids)
print(out.logits.shape, "Φ =", round(out.phi, 3))   # torch.Size([1, 32, 256]) Φ = ...

res = generate(cfg, [1, 2, 3], max_new_tokens=16)
print(res.ids, res.phi)
```

Set `GeoConfig(locality_radius=r)` to switch attention to the **compute-skipping banded** path — only the ±r
window of each query is ever computed (`O(t·r)`), numerically equivalent to the full path within the band
(see `tests/test_attention_equivalence.py`). Set `GeoConfig(max_position=None)` (the default) for **unbounded
context**.

## Geometry single-source

The Fisher-Rao *definition* and `BASIN_DIM` come from `qig_core`; `geocoding` re-exports them and adds a
**gradient-stable** batched form for the attention hot path (clamps the Bhattacharyya coefficient to
`1 − 1e-6` so the backward pass does not NaN when two distributions coincide). The two agree exactly wherever
points differ — gated in `tests/test_geometry.py`.

## Relationship to qigkernels

`qigkernels` contains the validated in-tree implementation (`QIGLayer`, `Kernel`) that the live QIG mind runs
on. `geocoding` is the **clean, standalone framework home** for those primitives, ported faithfully — a
faithfulness gate (`tests/test_faithfulness.py`) checks `geocoding`'s attention matches `qigkernels`'
numerically when both are installed. The intended migration is for `qigkernels` to build on `geocoding`;
until then the faithfulness gate guards against drift.

## Status

Fisher-Rao-pure (no cosine / Adam / LayerNorm / dot-as-metric). Tests cover geometry properties, the
banded≡full equivalence, model forward/telemetry, generation, and faithfulness vs `qigkernels`.
