Metadata-Version: 2.4
Name: hippotorch
Version: 0.5.0
Summary: Differentiable episodic memory for reinforcement learning.
Author: Döme Zsolt
Keywords: reinforcement-learning,episodic-memory,pytorch,replay-buffer,rl
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.0
Requires-Dist: numpy>=1.21
Provides-Extra: test
Requires-Dist: pytest>=7.0; extra == "test"
Requires-Dist: pytest-cov>=4.0; extra == "test"
Requires-Dist: hypothesis>=6.80; extra == "test"
Provides-Extra: math
Requires-Dist: sympy>=1.14; extra == "math"
Requires-Dist: scipy>=1.14; extra == "math"
Requires-Dist: hypothesis>=6.80; extra == "math"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: black>=23.7; extra == "dev"
Requires-Dist: ruff>=0.1.7; extra == "dev"
Requires-Dist: isort>=5.12; extra == "dev"
Requires-Dist: mypy>=1.7; extra == "dev"
Requires-Dist: pre-commit>=3.5; extra == "dev"
Requires-Dist: hypothesis>=6.80; extra == "dev"
Requires-Dist: sympy>=1.14; extra == "dev"
Requires-Dist: scipy>=1.14; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs>=1.5; extra == "docs"
Requires-Dist: mkdocstrings[python]>=0.24; extra == "docs"
Requires-Dist: mkdocs-material>=9.5; extra == "docs"
Provides-Extra: gym
Requires-Dist: gymnasium==0.28.1; extra == "gym"
Provides-Extra: envs
Requires-Dist: gymnasium==0.28.1; extra == "envs"
Requires-Dist: minigrid>=3.0.0; extra == "envs"
Requires-Dist: pygame>=2.4.0; extra == "envs"
Provides-Extra: atari
Requires-Dist: gymnasium[atari]==0.28.1; extra == "atari"
Requires-Dist: autorom>=0.6; extra == "atari"
Provides-Extra: hub
Requires-Dist: huggingface_hub>=0.20; extra == "hub"
Requires-Dist: safetensors>=0.4; extra == "hub"
Requires-Dist: jsonschema>=4.0; extra == "hub"
Provides-Extra: viz
Requires-Dist: matplotlib>=3.7; extra == "viz"
Provides-Extra: umap
Requires-Dist: umap-learn>=0.5; extra == "umap"
Provides-Extra: robotics
Requires-Dist: gymnasium-robotics>=1.4.2; extra == "robotics"
Requires-Dist: mujoco>=2.3.0; extra == "robotics"
Provides-Extra: release
Requires-Dist: build>=1.0; extra == "release"
Requires-Dist: twine>=4.0; extra == "release"
Provides-Extra: min-torch
Requires-Dist: numpy<2; extra == "min-torch"
Requires-Dist: torch==2.0.*; extra == "min-torch"
Provides-Extra: faiss
Requires-Dist: faiss-cpu>=1.7.4; extra == "faiss"
Provides-Extra: faiss-gpu
Requires-Dist: faiss-gpu>=1.7.4; extra == "faiss-gpu"
Dynamic: license-file

# hippotorch

[![PyPI](https://img.shields.io/pypi/v/hippotorch?logo=pypi&logoColor=white)](https://pypi.org/project/hippotorch/)
[![pipeline status](https://gitlab.com/domezsolt/hippotorch/badges/main/pipeline.svg)](https://gitlab.com/domezsolt/hippotorch/-/pipelines)
[![coverage](https://gitlab.com/domezsolt/hippotorch/badges/main/coverage.svg)](https://gitlab.com/domezsolt/hippotorch/-/pipelines)
[![docs](https://img.shields.io/badge/docs-GitLab%20Pages-blue)](https://domezsolt.gitlab.io/hippotorch)

Tested on: Ubuntu 22.04, 24.04

> Differentiable memory geometry with discrete retrieval for RL, focused on sparse rewards, long horizons, and partial observability.

[Changelog](CHANGELOG.md)

Hippotorch is an episodic memory substrate designed for tasks where agents forget early lessons, face sparse rewards, or operate under partial observability. It keeps experiences in a learnable memory geometry so agents can remember rare successes, connect distant cause and effect, and transfer knowledge between similar worlds. The memory geometry is differentiable through the encoders and sleep objective; retrieval and replay sampling remain discrete over replay indices. Under the hood it uses contrastive learning with optional reward weighting.

---

## Highlights

- **Memory that adapts with you.** Dual encoders organize episodes by usefulness instead of mere recency.
- **Semantic + uniform sampling.** A single buffer can surface hard-to-find wins while the uniform component provides a support floor and reduces harmful temporal correlation.
- **Production-friendly extras.** Hugging Face Hub export, FAISS retrieval, Gymnasium wrappers, and health reports ship in the box.
- **Batteries included.** Dozens of scripts and docs show exactly how to benchmark, visualize, and share results.

If you already converge with a plain replay buffer, keep it. Hippotorch shines when agents forget early lessons, face sparse rewards, or operate in partially observed environments.

### When Not To Use Hippotorch
- Dense-reward tasks that already converge quickly with uniform replay
- Short-horizon problems where distant credit assignment is not a concern
- Settings where an additional encoder/consolidation step would dominate runtime

---

## Installation

```bash
pip install hippotorch            # minimal setup
pip install hippotorch[faiss]     # fast nearest-neighbor retrieval
pip install hippotorch[envs]      # Gymnasium helpers + examples
pip install hippotorch[atari]     # Atari wrappers + AutoROM tooling
pip install hippotorch[hub]       # Hugging Face Hub + safetensors
pip install hippotorch[viz]       # matplotlib plots and heatmaps
pip install hippotorch[umap]      # projector UMAP export
```

Requirements: Python ≥3.9, PyTorch ≥2.0 (CI enforces ≥80% test coverage)

Dev install: pip install -e .[dev]

---

## Quick Tour

Create an encoder + memory, store rollouts in the recommended explicit-transition layout (`states[t]`, `actions[t]`, `rewards[t]`, `next_states[t]`, and `dones[t]` per row), then mix semantic samples with a uniform component that provides a support floor and reduces harmful temporal correlation:

> Warning: Legacy episodes without `next_states` are accepted for compatibility. Hippotorch warns at replay ingestion only when the layout looks risky, such as sparse terminal-reward data whose final reward would be skipped; explicit `next_states` are recommended for sparse terminal-reward tasks so the final terminal transition keeps its observed next state.

The formal replay-index notation \(z=(i,t)\in\mathcal I_M\), materialization map
\(X_M(z)\), and coverage-floor lemma are stated in
[Mathematical Contract](docs/mathematical_contract.md) and
[Coverage Floor](docs/theory/03_coverage_floor.md). The lemma guarantees support,
not useful sampling frequency for rare rewards.

Optional reward-topology alignment remains disabled by default. Enable a KL
direction with `Consolidator(..., reward_topology_weight=..., reward_topology_mode="forward_kl")`
or choose `reward_topology_mode="reverse_kl"` or `"symmetric_kl"`.
Reward-topology alignment requires encoder-normalized embeddings, so
`Consolidator` rejects `DualEncoder(normalize=False)` when
`reward_topology_weight > 0`.
Set `contrastive_positive_mode="temporal_bucket"` to treat both temporal
neighbors and sampled episodes in the same configured metadata bucket as
consolidation positives. By default, buckets use explicit success flags and
return quantiles; pass `contrastive_bucket_sources=("success", "return", "task",
"goal", "advantage")` to include task IDs, goal IDs, or advantage quantiles.
Set `mask_bucket_false_negatives=True` to remove configured bucket matches from
the temporal InfoNCE negative set without switching to the multi-positive loss.

```python
import torch
from hippotorch import Episode, DualEncoder, MemoryStore, HippocampalReplayBuffer, Consolidator

state_dim, action_dim = 4, 1
encoder = DualEncoder(input_dim=state_dim + action_dim + 1, embed_dim=128)
# Query/key embeddings are unit-normalized by default; pass normalize=False
# to keep raw projector outputs. Retrieval, InfoNCE, and diagnostics use the
# returned embeddings directly, so raw outputs produce raw dot-product scores.
memory = MemoryStore(embed_dim=128, capacity=50_000)
consolidator = Consolidator(encoder, temperature=0.1, reward_weight=0.5)
buffer = HippocampalReplayBuffer(
    memory=memory,
    encoder=encoder,
    mixture_ratio=0.3,
    min_uniform_weight=0.05,
    consolidator=consolidator,
)
# Advanced schedules can return SamplingConfig or
# (uniform_weight, semantic_weight, priority_weight).
# SamplingConfig.semantic_temperature controls the softmax over retrieved
# episodes for semantic replay.
# The default within-episode policy is "uniform": semantic relevance stays at
# the episode level while transition draws stay decorrelated within that episode.
# SamplingConfig.within_episode_policy can choose "uniform", "reward_weighted",
# "priority_weighted", or "window_uniform" transitions after semantic retrieval.
# SamplingConfig.within_episode_window_size controls "window_uniform" width.
# Cold-start schedules can keep uniform_weight high until encoder keys stabilize.

states = torch.randn(32, state_dim)
actions = torch.randn(32, action_dim)
rewards = torch.randn(32)

# Simple legacy style: next states are inferred from states[t + 1].
legacy_episode = Episode(states=states, actions=actions, rewards=rewards)

# Recommended explicit-transition storage: one row per observed transition.
# This keeps the final terminal transition because next_states[-1] is stored directly.
next_states = torch.roll(states, shifts=-1, dims=0)
next_states[-1] = torch.randn(state_dim)  # final terminal observation
dones = torch.zeros(32, dtype=torch.bool)
dones[-1] = True
episode = Episode(
    states=states,
    actions=actions,
    rewards=rewards,
    next_states=next_states,
    dones=dones,
)
buffer.add_episode(episode)

# Query-aware sampling
query_state = torch.cat([states[0], torch.zeros(action_dim), rewards[:1]])
batch = buffer.sample(batch_size=64, query_state=query_state, top_k=5)
# batch["indices"] stores (episode_idx, step_idx); batch["episode_ids"] stores stable IDs.
# batch["sample_modes"] stores SampleMode ints: uniform=0, semantic=1, priority=2.
# batch["semantic_probs"] stores p_s(z | q, M), zero outside semantic support.
# Top-k semantic retrieval does not provide a global support floor; keep a
# nonzero uniform component when every stored transition must retain support.
# batch["sampling_probs"] stores the aligned mixture probability for each transition.
# batch["weights"] stores aligned correction weights; "none" gives ones and
# "mixture" keeps PER weights only for priority samples.

# Sleep/consolidate occasionally
metrics = buffer.consolidate(steps=50, batch_size=64, report_quality=True)
print(
    metrics["loss"],
    metrics["temporal_loss"],
    metrics["reward_weighted_temporal_loss"],
    metrics["reward_topology_loss"],
    metrics["masked_negatives"],
)
```

Compatibility note: the deprecated `mixture_ratio` and `priority_mode`
constructor arguments still work, but `HippocampalReplayBuffer` converts them
into `buffer.sampling_config`. Prefer explicit `SamplingConfig` weights or
`mixture_schedule` values in new code.

If you enable prioritized replay, update priorities directly from the replay
batch metadata:

```python
batch = buffer.sample(...)
td_errors = compute_td_errors(batch)
buffer.update_priorities(batch["indices"], td_errors)
```

After an update from the most recent sampled batch, `buffer.stats()` and
`buffer.sampling_stats()` expose `priority_update_td_error_{mode}_count`,
`priority_update_td_error_{mode}_mean`, and
`priority_update_td_error_{mode}_max` for uniform, semantic, and priority
samples.

For replay-batch correction, pass `correction_reference` to
`HippocampalReplayBuffer` or `SamplingConfig`. `correction_reference="none"`
leaves all `batch["weights"]` equal to one. `correction_reference="mixture"` keeps
priority-sampler PER weights on priority samples and uses one for semantic or
uniform samples; use `"uniform"` for uniform-reference mixture correction. With
priority-only replay, `"uniform"` reduces to the classic PER correction
`((1 / (N * nu(z))) ** beta)`, normalized within the returned batch.

Rolling with Stable Baselines 3 or Gymnasium? Wrap your existing replay buffer with `SB3ReplayBufferWrapper` or the `HippotorchMemoryWrapper`. The SB3 wrapper can drive semantic replay by passing a query observation (defaults to the most recent observation) and supports a custom query-building hook. Note: the SB3 wrapper currently targets single-environment rollouts; vectorized envs (VecEnv) are not supported yet.

Need hyperparameter guidance? Start with `docs/hyperparameter_guide.md` for recommended ranges, then see `docs/diagnostics.md` for health checks and `docs/curriculum.md` for training tips.

---

## Everyday Tools

### Recall While Acting
- Use the lightweight read API: `from hippotorch import query`.
- Pipe `query(..., top_k=5)` results into policies or logging code.
- Gymnasium adapter emits dict observations so SB3 policies can consume retrieval features alongside pixels.
- Examples: `examples/query_inference_demo.py`, `examples/minigrid_memory_wrapper.py`.

### Portable Brains
- Share trained memories with `push_memory_to_hub` / `load_memory_from_hub`.
- Choose local folders for offline passes or Hugging Face Hub for team-wide reuse.
- `scripts/hub_roundtrip_smoke.py` is a 30-second sanity check.
- Docs: `docs/hub.md`.

### Glass-Box Diagnostics
- `buffer.health_report()` returns retrievability, timestamp and drift staleness, collapse indicators, and alignment scores.
- Log with `report.to_tensorboard(writer, step)` or `report.to_wandb(run)`.
- See `docs/diagnostics.md` for visuals.

### Batch Retrieval for Low Latency
- `buffer.query_batch(query_vecs, top_k=K)` handles `[B,T,D]` tensors in one go.
- Matches single-query results without looping Python.
- Works with both torch and FAISS backends.

### Multi-GPU Encoding
- Set `multi_gpu=True` on `DualEncoder`/`VisualEpisodeEncoder` or `Consolidator` to enable `torch.nn.DataParallel` when multiple GPUs are present.
- Snapshots handle `module.` prefixes transparently; save/load works across single- and multi-GPU runs.

---

## Ready-to-Run Samples

Pick a script, set a seed, and you get a reproducible snapshot:

- **Benchmarks & diagnostics**
  - Retrieval perf: `python scripts/bench_retrieval.py --sizes 10000 100000`
  - Priority sampling modes: `python scripts/priority_sampling_benchmark.py --sizes 10000 100000`
  - Visualization: `python scripts/export_projector_embeddings.py --snapshot run.pt`
  - Retrieval heatmap: `python scripts/retrieval_heatmap.py --memory-checkpoint ...`
- **Environments**
  - CartPole smoke: `bash scripts/quick_cartpole.sh`
  - Corridor curriculum/oracle: `bash scripts/corridor_curriculum.sh`, `bash scripts/corridor_oracle_zn.sh`
  - MiniGrid sweeps: `python scripts/minigrid_memory_benchmark.py --steps 8000 --seeds 3`
  - FetchReach benchmark: `bash scripts/fetchreach_benchmark.sh`
  - HER comparison (FetchReach): `bash scripts/her_comparison.sh`
  - Intrinsic curiosity example: `python -m examples.intrinsic_demo --episodes 20`
- **Ablations & studies**
  - Rank-weighted consolidation: `bash scripts/run_rank_ablation.sh`
  - Consolidation micro bench: `bash scripts/run_consolidation_micro.sh`
  - Visual MiniGrid clustering: `python -m examples.minigrid_visual --steps 2000`

All scripts keep runtime under a couple of minutes unless stated otherwise. Longer jobs (corridor oracle full run, curriculum sweeps) note their expected duration in the script header.

---

## Learn More

- [docs/benchmarks.md](docs/benchmarks.md) – retrieval setups, FAISS parity, and profiling tips.
- [docs/curriculum.md](docs/curriculum.md) – how to stage corridor tasks and measure regret.
- [docs/usage.md](docs/usage.md) – wrappers, segmenters, and rollout recipes.
- [docs/hub.md](docs/hub.md) – how to move memories between machines or teammates.
- Getting started notebook: `docs/tutorials/getting_started.ipynb`
- API Reference (MkDocs): build locally with `make docs` and open `site/index.html` (source: [docs/api.md](docs/api.md)). Hosted docs: https://domezsolt.gitlab.io/hippotorch
- Sparse Atari pilot (Montezuma’s Revenge): `bash scripts/atari_pilot.sh` or `python -u scripts/atari_sparse_pilot.py --env ALE/MontezumaRevenge-v5 --steps 10000` (requires optional extras: `pip install hippotorch[atari]` then run `AutoROM --accept-license`). See `docs/atari_pilot.md`.

Problems or ideas? Open an issue or send a Merge Request on GitLab.
