Metadata-Version: 2.4
Name: hypersort
Version: 0.2.0
Summary: Riemannian Comparison Manifold Sort with GTC — exact + approximate sorting with outlier detection
Author-email: William Ken Ohara Stewart <nagusamecs@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/NagusameCS/HyperTensor
Project-URL: Repository, https://github.com/NagusameCS/HyperTensor/tree/main/hypersort
Project-URL: Bug Tracker, https://github.com/NagusameCS/HyperTensor/issues
Keywords: sorting,riemannian-geometry,manifold,geodesic,outlier-detection,anomaly-detection,pca
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-benchmark>=4.0; extra == "dev"
Provides-Extra: all
Requires-Dist: numpy>=1.24.0; extra == "all"
Requires-Dist: pytest>=7.0; extra == "all"
Dynamic: license-file

# HyperSort: Riemannian Comparison Manifold Sort with GTC

> *Exact sorting + approximate manifold sort + outlier detection.*

**HyperSort** is a Python package that uses the
[HyperTensor Geometric Jury framework](https://github.com/NagusameCS/HyperTensor)
to provide two sort modes and per-element confidence scoring via a
Riemannian Comparison Manifold with Geodesic Trajectory Cache (GTC).

## Quick Start

```python
from hypersort import hypersort, ComparisonManifold

# One-shot exact sort
result = hypersort([3.14, 1.41, 2.71, 1.73, 0.57])
print(result.sorted_data)  # [0.57, 1.41, 1.73, 2.71, 3.14]

# Build once, sort many (exact)
manifold = ComparisonManifold()
manifold.build(training_data)  # O(n²), one-time

result = manifold.sort(new_data)         # exact, always correct
result = manifold.sort_manifold(new_data)  # GTC approximate + verdicts
verdicts = manifold.classify(data)        # outlier detection only
```

## Two Sort Modes

| Method | Ordering | Speed | Use case |
|--------|----------|-------|----------|
| `sort()` | Exact (encoder dim 0) | O(n log n) | Correctness matters |
| `sort_manifold()` | GTC approximate | O(n·k + n log n) | Speed + confidence + outlier detection |

## Outlier Detection

`classify()` and `sort_manifold()` assign each element a verdict tier:

| Tier | Meaning |
|------|---------|
| `RETRIEVE` | Close to cached trajectory — high confidence |
| `AUGMENT` | Nearby — needs small correction |
| `EXPAND` | Far but within manifold — low confidence |
| `EXPLORE` | Out-of-distribution — possible anomaly |

Statistical z-score check (|z| > 3 on sort key) automatically upgrades to EXPLORE.

```python
manifold.build(training_data)
verdicts = manifold.classify(mixed_data)
for v in verdicts:
    if v.verdict_type == 'EXPLORE':
        print(f"Outlier detected! d={v.geodesic_distance:.4f}")
```

## Configuration

| Parameter | Default | Description |
|-----------|---------|-------------|
| `intrinsic_dim` | 0 (auto) | Manifold dimension k. 0 = auto-tune via 95% explained variance |
| `variance_ratio` | 0.95 | Explained variance to retain when auto-tuning k |
| `num_jurors` | 7 | Trajectories consulted per query |
| `cache_threshold` | 0.05 | RETRIEVE tier geodesic distance threshold |
| `use_gpu` | False | Enable CuPy GPU acceleration (`pip install cupy`) |

## API Reference

### One-shot

```python
from hypersort import hypersort

result = hypersort(data)                      # exact sort
result = hypersort(data, mode='manifold')     # exact + verdicts + confidence
result = hypersort(data, compute_pairwise=False)  # skip O(n²) pairwise matrix
```

### Reusable Workflow

```python
from hypersort import ComparisonManifold, ManifoldConfig

manifold = ComparisonManifold()               # auto-tune k
# or: ManifoldConfig(intrinsic_dim=16, use_gpu=True)
manifold.build(training_data)                 # O(n²), one-time

# Exact sorting
result = manifold.sort(new_data)              # always correct
result = manifold.sort(new_data, compute_pairwise=True)  # + jury confidence

# GTC approximate sorting
result = manifold.sort_manifold(new_data)     # approximate positions + verdicts

# Outlier detection
verdicts = manifold.classify(data)            # batch
verdict  = manifold.classify_one(element)     # single element (streaming)
```

### Persistence

```python
manifold.save('manifold.npz')
loaded = ComparisonManifold.load('manifold.npz')
```

### Verdict Tiers

```python
from hypersort import VerdictTier

for v in result.verdicts:
    if v.verdict_type == VerdictTier.EXPLORE:
        print(f"Outlier: d={v.geodesic_distance:.4f}, conf={v.confidence:.4f}")
```

## Honest Complexity

| Phase | Complexity |
|-------|-----------|
| Build | O(n·d·k + n²) — SVD + GTC construction (one-time) |
| `sort()` | O(n log n + n·d) — exact argsort |
| `sort_manifold()` | O(n·k + n log n) — GTC lookup + sort |
| `classify()` | O(n·k) — geodesic distances to cache |
| `classify_one()` | O(k) — single-element lookup |

## How It Works

### Build Phase (one-time, O(n²))

```
Data → [Encoder] → dim 0 = sort key, dims 1+ = manifold features
     → [SVD on dims 1+] → PCA manifold basis [d-1, k]
     → [Normalize] → Unit Sphere S^{k-1} (trajectory cache)
     → [Sort by dim 0] → GTC indexed by ground-truth positions
```

### Sort Phase

- `sort()`: exact ordering by encoder dim 0 via argsort
- `sort_manifold()`: project new data onto manifold → query GTC for
  nearest cached trajectories → approximate positions + verdict tiers
- `classify()`: project → measure geodesic distance to GTC cache →
  assign RETRIEVE/AUGMENT/EXPAND/EXPLORE based on proximity and
  statistical z-score (|z| > 3 → EXPLORE)

## Limitations

- String encoder: first 7 chars for positional encoding (exact via Python ints)
- GTC sort is approximate for out-of-distribution data
- Manifold build is O(n²); best with `build()` once, `sort_manifold()` many
- GPU acceleration requires CuPy (`pip install cupy`)

## Citation

```bibtex
@software{stewart2026hypersort,
  title     = {{HyperSort}: {Riemannian} Comparison Manifold Sort},
  author    = {Stewart, William Ken Ohara},
  year      = {2026},
  publisher = {GitHub},
  journal   = {GitHub repository},
  url       = {https://github.com/NagusameCS/HyperTensor/tree/main/hypersort}
}
```

## License

MIT — see [LICENSE](LICENSE) file.

## Related Work

- [HyperTensor](https://github.com/NagusameCS/HyperTensor) — The parent project
- Papers I–XVIII in `ARXIV_SUBMISSIONS/` — Full theoretical foundation
- `volume_extended.tex` — Master compilation of all papers
