Metadata-Version: 2.4
Name: fisher-homology
Version: 1.0.1
Summary: Persistent homology on Fisher information distances for probability manifolds
Author: William R. Williamson
License-Expression: MIT
Project-URL: Homepage, https://github.com/WillItWithWill/fisher-homology
Project-URL: Documentation, https://github.com/WillItWithWill/fisher-homology#readme
Project-URL: Repository, https://github.com/WillItWithWill/fisher-homology
Project-URL: Issues, https://github.com/WillItWithWill/fisher-homology/issues
Keywords: persistent homology,topological data analysis,TDA,Fisher information,probability manifold,Betti numbers,Vietoris-Rips,persistence diagram,phase transition detection
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Physics
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: numpy
Requires-Dist: numpy>=1.20; extra == "numpy"
Provides-Extra: plot
Requires-Dist: matplotlib>=3.4; extra == "plot"
Requires-Dist: numpy>=1.20; extra == "plot"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: numpy>=1.20; extra == "dev"
Dynamic: license-file
Dynamic: requires-python

# fisher-homology

**Persistent homology on Fisher information distances for probability manifolds.**

A pure-Python, zero-dependency implementation of topological data analysis (TDA)
designed specifically for probability trajectory analysis. Uses the Fisher
information arc length as the filtration metric, which correctly expands distances
at the tails of probability distributions — exactly where critical events in fraud
detection, medical diagnostics, physics experiments, and financial risk live.

[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://python.org)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
[![Zero dependencies](https://img.shields.io/badge/dependencies-none-brightgreen.svg)]()

---

## Why Fisher distances?

Standard topological data analysis uses Euclidean distance. For probability
trajectories, this is the wrong metric.

Euclidean distance treats `p=0.001` and `p=0.002` the same as `p=0.499` and
`p=0.500` — both have `|Δp| = 0.001`. But informationally these are completely
different: the first pair are both rare events with a large relative difference
(the second is twice the first), while the second pair are near-average values
with negligible relative difference.

The **Fisher arc length** is the geodesic distance on the statistical manifold of
Bernoulli distributions, equipped with the Fisher information metric
`g(p) = 1/(p(1−p))`:

```
d_F(p, q) = 2 · |arcsin(√p) − arcsin(√q)|
```

Properties:
- Range `[0, π]` — the full manifold diameter
- `d_F(0, 1) = π` — maximum separation
- Symmetric and satisfies the triangle inequality (true metric)
- At `p = 0.01`, Fisher expands distances ~10× vs Euclidean
- Produces topologically meaningful features for probability trajectories

---

## Installation

```bash
pip install fisher-homology
```

No dependencies required. Works on any Python 3.8+ installation.

### Optional dependencies

```bash
pip install fisher-homology[numpy]    # faster distance computation
pip install fisher-homology[plot]     # matplotlib visualization helpers
pip install fisher-homology[dev]      # pytest for running the test suite
```

### From source

```bash
git clone https://github.com/williamrwilliamson/fisher-homology
cd fisher-homology
pip install -e .
```

---

## Quick Start

```python
from fisher_homology import FisherHomology
import numpy as np

# Three-phase trajectory: normal → stress → crisis
np.random.seed(42)
states = []
for t in range(15):
    if   t < 5:  p = [0.10 + 0.01*np.random.randn(), 0.12 + 0.01*np.random.randn()]
    elif t < 10: p = [0.45 + 0.03*np.random.randn(), 0.50 + 0.03*np.random.randn()]
    else:        p = [0.85 + 0.02*np.random.randn(), 0.88 + 0.02*np.random.randn()]
    states.append([float(np.clip(x, 0.01, 0.99)) for x in p])

ph     = FisherHomology()
result = ph.fit(states)

print(result.summary())
# Persistence Diagram (fisher metric)
#   States:            15
#   Max epsilon:       4.284
#   β₀ features:       12
#   Bottleneck width:  1.471
#   Phase gaps at ε:   ['1.249', '1.353']
#   Estimated phases:  3
#   Has cycles (β₁):   False
```

---

## Core API

### `FisherHomology`

```python
ph = FisherHomology(
    n_steps=50,        # filtration resolution
    max_epsilon=None,  # auto-determined from max pairwise distance
)
```

#### `fit(states, metric='fisher') → PersistenceDiagram`

Compute persistent homology of a probability trajectory.

```python
# states: list of T probability vectors, each of length n
# All probabilities must be in (0, 1)
result = ph.fit(states)
```

#### `compare_trajectories(states_a, states_b) → dict`

Compare two trajectories using bottleneck distance.

```python
comparison = ph.compare_trajectories(trajectory_1, trajectory_2)
print(comparison['bottleneck_b0'])      # topological distance
print(comparison['interpretation'])     # human-readable verdict
```

#### `rips_at_epsilon(states, epsilon) → dict`

Snapshot of the Vietoris-Rips complex at a specific ε.

```python
snapshot = ph.rips_at_epsilon(states, epsilon=0.5)
print(snapshot['beta_0'], snapshot['beta_1'])
print(snapshot['euler_characteristic'])
```

#### `fit_transform(states, return_both_metrics=False) → dict`

Compute Fisher and optionally Euclidean diagrams for comparison.

```python
both = ph.fit_transform(states, return_both_metrics=True)
fisher_diag    = both['fisher']
euclidean_diag = both['euclidean']
```

---

### `PersistenceDiagram`

Result container returned by `FisherHomology.fit()`.

| Attribute | Type | Description |
|---|---|---|
| `persistence_b0` | `list[(birth,death)]` | β₀ (component) lifetime pairs |
| `persistence_b1` | `list[dict]` | β₁ (loop) birth/death events |
| `betti_curve` | `dict[ε→(β₀,β₁)]` | Betti numbers at each scale |
| `bottleneck_width` | `float` | Max β₀ lifetime (signal strength) |
| `phase_gaps` | `list[float]` | ε values of phase transitions |
| `max_epsilon` | `float` | Filtration range used |
| `n_states` | `int` | Number of input states |
| `metric` | `str` | `'fisher'` or `'euclidean'` |

```python
result.n_phases()     # estimated number of distinct regimes
result.has_cycles()   # True if trajectory contains loops (trapped states)
result.summary()      # human-readable summary string
```

---

### Distance functions

```python
from fisher_homology import fisher_arc, fisher_distance_matrix
from fisher_homology.distances import fisher_arc_position, fisher_gradient

# Scalar arc distance
d = fisher_arc(0.1, 0.9)                 # float in [0, π]

# Arc position (maps probability to manifold position)
pos = fisher_arc_position(0.5)           # = π/2

# Fisher information (gradient of arc w.r.t. p)
info = fisher_gradient(0.3)              # = 1/√(0.3 × 0.7)

# Full pairwise distance matrix
states = [[0.1, 0.2], [0.5, 0.6], [0.9, 0.8]]
D = fisher_distance_matrix(states)       # 3×3 symmetric matrix

# Tail expansion: how much Fisher expands vs Euclidean at p=0.01
ratio = tail_expansion_ratio(0.01)       # ≈ 10.0
```

---

### Topology functions

```python
from fisher_homology.topology import (
    b0_persistence,
    betti_curve,
    persistence_diagram,
    bottleneck_distance,
    vietoris_rips_betti,
    UnionFind,
)

# β₀ persistence from a distance matrix
pairs = b0_persistence(D, max_epsilon=5.0)

# Betti curve: (β₀, β₁) at each filtration step
curve = betti_curve(D, n_steps=50)

# Full persistence diagram
diag = persistence_diagram(D, n_steps=50)

# Bottleneck distance between two diagrams
dist = bottleneck_distance(diag_a['persistence_b0'],
                           diag_b['persistence_b0'])

# Vietoris-Rips complex at one ε
rips = vietoris_rips_betti(D, epsilon=1.0)
```

---

### Utils

```python
from fisher_homology.utils import (
    validate_state_sequence,
    normalize_states,
    trajectory_summary,
)

# Validate and clip probability vectors
clean = validate_state_sequence(raw_states)

# Normalize to (0, 1)
normed = normalize_states(states, method='clip')   # or 'scale'

# Descriptive statistics
stats = trajectory_summary(states)
# {'n_states': 15, 'n_dims': 2, 'mean_probs': [...],
#  'std_probs': [...], 'trajectory_length': 4.28}
```

---

## Interpretation Guide

### Phase transitions

A **phase gap** in the persistence diagram marks a ε value where a large
connected component merge occurs — two topologically distinct regimes that
were previously separate become reachable from each other.

```
Large gap → significant phase transition
Small gap → gradual drift, no sharp regime change
```

### Cyclic trapping

A **β₁ feature** (loop) indicates the trajectory returned to a previously
visited region of probability space without escaping. In protein folding this
is a misfolding intermediate. In fraud detection it is a network that almost
cascades but recovers. In clinical monitoring it is a patient oscillating
between two states.

### Bottleneck width interpretation

```
< 0.05 × max_ε   →  all states are in one continuous cloud
0.05–0.15         →  weak phase structure
0.15–0.40         →  moderate phase separation
> 0.40            →  strong, well-separated phases
```

### Fisher vs Euclidean comparison

```python
both = ph.fit_transform(states, return_both_metrics=True)
fisher_phases    = both['fisher'].n_phases()
euclidean_phases = both['euclidean'].n_phases()

if fisher_phases > euclidean_phases:
    print("Fisher detects additional phase structure at the tails.")
    print("Tail events are driving regime separation.")
```

---

## Applications

The Fisher metric is particularly valuable for probability trajectories where:

- **Rare events matter**: fraud detection (`p ≈ 0.001`), medical diagnosis,
  gravitational wave detection (`p ≈ 0.9999`)
- **Phase transitions are critical**: protein folding intermediates, market
  regime shifts, clinical state changes
- **Cycle detection is needed**: misfolding loops, oscillating fraud networks,
  treatment resistance patterns

---

## Running the Tests

```bash
# Install with dev dependencies
pip install fisher-homology[dev]

# Run tests
pytest tests/ -v

# Or directly
python tests/test_fisher_homology.py
```

All 44 tests are non-tautological — each verifies a mathematically provable
property against analytical ground truth.

---

## Mathematical Background

### Fisher Information Metric

For a Bernoulli distribution parameterised by `p`, the Fisher information is:

```
I(p) = 1 / (p(1-p))
```

The geodesic distance in this Riemannian metric is the Hellinger arc length:

```
d_F(p, q) = 2 · |arcsin(√p) − arcsin(√q)|
```

This is equivalent to the angle between the square-root-transformed probability
vectors on the unit sphere — a natural geometric interpretation.

### Vietoris-Rips Filtration

Given T states with pairwise Fisher distances `D[i,j]`, the Vietoris-Rips
complex `VR_ε` includes all simplices whose diameter is at most `ε`:

- `ε = 0`: T isolated points, β₀ = T, β₁ = 0
- As `ε` grows: components merge (β₀ decreases), loops form and fill (β₁ varies)
- `ε = ∞`: fully connected, β₀ = 1, β₁ = 0

### Persistent Homology

Features are tracked as `(birth_ε, death_ε)` pairs. Long-lived features
(large `death − birth`) are robust signal. Short-lived features are noise.

The **stability theorem** (Cohen-Steiner et al. 2007) guarantees: if the
input data changes by at most `δ` (in Fisher distance), the persistence
diagram changes by at most `δ` in bottleneck distance.

---

## References

- Edelsbrunner, H. & Harer, J. (2010). *Computational Topology: An Introduction*. AMS.
- Rao, C. R. (1945). Information and the accuracy attainable in the estimation of statistical parameters. *Bull. Calcutta Math. Soc.* 37, 81–91.
- Fisher, R. A. (1915). Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. *Biometrika* 10(4), 507–521.
- Cohen-Steiner, D., Edelsbrunner, H. & Harer, J. (2007). Stability of persistence diagrams. *Discrete & Computational Geometry* 37(1), 103–120.
- Chazal, F. & Michel, B. (2021). An introduction to topological data analysis. *Frontiers in Artificial Intelligence* 4.

---

## License

MIT License — see [LICENSE](LICENSE).
