Metadata-Version: 2.4
Name: tda-repr
Version: 0.1.4
Summary: Topological and spectral analysis of neural representations.
Author: Stepan Pankratov
License: MIT
Project-URL: Repository, https://github.com/Octopupu5/tda_repr
Keywords: topological data analysis,representation learning,deep learning,computer vision,natural language processing,persistent homology,hodge laplacian,neural network monitoring
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Image Processing
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: scikit-learn
Requires-Dist: matplotlib
Requires-Dist: Pillow
Requires-Dist: torch
Requires-Dist: torchvision
Requires-Dist: transformers
Requires-Dist: datasets
Requires-Dist: tokenizers
Requires-Dist: medmnist
Requires-Dist: ripser
Requires-Dist: gudhi
Requires-Dist: tqdm
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: ruff>=0.6.0; extra == "dev"
Requires-Dist: pre-commit>=3.8.0; extra == "dev"
Dynamic: license-file

## tda-repr

Topological and spectral analysis toolkit for neural network representations.

`tda-repr` helps you monitor hidden-layer geometry during training and compare it to benchmark quality metrics (loss/accuracy/F1), with reproducible logs and plots.

It supports iterative research workflows with explicit run artifacts: configuration metadata, per-epoch structured logs, progress figures, checkpoint snapshots, and correlation reports. This makes comparisons between datasets, architectures, and fine-tune regimes reproducible and auditable.

`tda_repr/` is the library. `tools/` contains scripts used in the thesis/repro.

Run commands from the repo root. The `tools/` scripts are for the repo checkout (not the PyPI package).

### Install (from source)

```bash
python3 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
python -m pip install -r requirements.txt
python -m pip install -e .
```

### Run an experiment

Interactive:

```bash
python -m tools.run_experiment --interactive --interactive_ui tui
```

Non-interactive:

```bash
python -m tools.run_experiment \
  --no-interactive \
  --task cv \
  --dataset cifar10 \
  --model resnet18 \
  --device cpu \
  --finetune full \
  --epochs 20 \
  --batch_size 128 \
  --download
```

Outputs go to `runs/exp_*/` (`meta.json`, `metrics.jsonl`, `figures/`, `checkpoints/`, `correlations_report/`, `analysis/`).

### Analysis after successful run

Correlation report:

```bash
python -m tools.correlation_report --run_dir runs/<run_dir>
```

Embedding quality / layer selection:

```bash
python -m tools.evaluate_embeddings --run_dir runs/<run_dir> --checkpoint best_main --split val --device cpu --download --skip_existing
```

Early-stop sweep (offline):

```bash
python -m tools.repr_early_stop_sweep --roots runs --skip_existing
```

### Reproducibility

Zenodo [ https://doi.org/10.5281/zenodo.20114914 ] archive (`saved_runs/`) -> figures + 3 case tables:

```bash
./reproduction_cases.sh
```

Full regeneration (`runs/`) -> all remaining tables:

```bash
./reproduction_runs.sh
```
