Metadata-Version: 2.4
Name: fastmdxplora
Version: 2.0.0
Summary: FastMDXplora: Fully Automated SysTem for Molecular Dynamics eXploration
Author: Adekunle Aina, Derrick Kwan
Maintainer: Adekunle Aina
License: MIT
Project-URL: Homepage, https://github.com/aai-research-lab/FastMDXplora
Project-URL: Documentation, https://fastmdxplora.readthedocs.io
Project-URL: Repository, https://github.com/aai-research-lab/FastMDXplora
Project-URL: Issues, https://github.com/aai-research-lab/FastMDXplora/issues
Project-URL: Changelog, https://github.com/aai-research-lab/FastMDXplora/blob/main/CHANGELOG.md
Keywords: molecular-dynamics,md-simulation,trajectory-analysis,automation,orchestrator,computational-chemistry,biophysics,structural-biology,reproducibility
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Education
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Chemistry
Classifier: Topic :: Scientific/Engineering :: Physics
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.22
Requires-Dist: pyyaml>=6.0
Requires-Dist: mdtraj>=1.9.7
Requires-Dist: matplotlib>=3.5
Requires-Dist: scikit-learn>=1.0
Requires-Dist: pandas>=1.4
Requires-Dist: python-pptx>=0.6.21
Provides-Extra: md
Requires-Dist: pdbfixer; extra == "md"
Requires-Dist: openmm>=8.0; extra == "md"
Provides-Extra: ligand
Requires-Dist: openff-toolkit>=0.16; extra == "ligand"
Requires-Dist: openmmforcefields>=0.12; extra == "ligand"
Provides-Extra: plumed
Requires-Dist: openmm-plumed>=1.0; extra == "plumed"
Provides-Extra: test
Requires-Dist: pytest>=7.0; extra == "test"
Requires-Dist: pytest-cov>=4.0; extra == "test"
Provides-Extra: docs
Requires-Dist: sphinx>=7.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=2.0; extra == "docs"
Requires-Dist: myst-parser>=2.0; extra == "docs"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: ruff>=0.5.0; extra == "dev"
Requires-Dist: python-pptx>=0.6.21; extra == "dev"
Dynamic: license-file

# FastMDXplora

> **F**ully **A**utomated **Sy**s**T**em for **M**olecular **D**ynamics e**X**ploration

[![DOI](https://img.shields.io/badge/DOI-10.1002%2Fjcc.70350-blue)](https://doi.org/10.1002/jcc.70350)
[![PyPI](https://img.shields.io/pypi/v/fastmdxplora.svg)](https://pypi.org/project/fastmdxplora/)
[![Python](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Tests](https://github.com/aai-research-lab/FastMDXplora/actions/workflows/tests.yml/badge.svg)](https://github.com/aai-research-lab/FastMDXplora/actions)

---

**FastMDXplora** is a project-level orchestrator for end-to-end molecular dynamics studies. A single command takes a protein structure (or PDB ID) from input to publication-quality deliverable, coordinating four phases:

```
  setup  →  simulation  →  analysis  →  report
```

**FastMDXplora is the next generation of FastMDAnalysis** (Aina & Kwan, *J. Comput. Chem.* 2026, [DOI: 10.1002/jcc.70350](https://doi.org/10.1002/jcc.70350)) — the same automated, reproducibility-by-design philosophy, extended from trajectory analysis to the full molecular dynamics study: setup, simulation (including enhanced sampling), protein and protein-ligand analysis, and reporting. It is *not* a generic workflow engine — the workflow is built-in, the domain knowledge is built-in, and the user expresses intent rather than describing a workflow graph (a DAG, or directed acyclic graph, the task-and-dependency model used by tools like Snakemake and Nextflow).

## Highlights

- **Single-command end-to-end MD** — from PDB to slides in one invocation
- **Protein-ligand ready** — parameterize a small-molecule ligand (OpenFF) from a feasible bound pose; ligand-aware analyses (pose RMSD, contacts, protein-ligand H-bonds) run automatically
- **Project-level orchestrator pattern** — shared state, registered phases, intelligent defaults, consolidated outputs
- **Granular control when you want it** — run any single phase independently
- **Self-contained** — the analysis and report phases have no heavy runtime dependencies
- **Reproducibility built in** — every run writes a structured manifest of parameters, software versions, and artifact paths
- **Publication-quality reporting** — automated slide deck, structured Markdown report, self-contained project bundle

## Installation

FastMDXplora's four phases have different dependency footprints. The **analysis and report** phases work from pip alone; the **setup and simulation** phases need PDBFixer + OpenMM, which are distributed primarily through conda-forge. So there are two routes — pick by what you need.

### Full install (all four phases) — from the git repo

The setup/simulation chemistry stack (OpenMM, PDBFixer) installs most reliably from conda-forge, so the full install uses the bundled `environment.yml`. We recommend `mamba` (a faster conda solver); plain `conda` works too.

```bash
git clone https://github.com/aai-research-lab/FastMDXplora.git
cd FastMDXplora
```
```bash
mamba env create -f environment.yml || conda env create -f environment.yml
```
```bash
conda activate fastmdxplora
pip install .
```

> Don't have `mamba`? Either install Miniforge (see [below](#mamba--miniforge-optional)), or just use `conda` — the `||` above falls back to it automatically.

### Analysis + report only — from PyPI

If you only need to analyze existing trajectories and build reports (no simulation), plain pip is enough — no conda required:

```bash
pip install fastmdxplora              # primary package
pip install fastmdx                    # alias (resolves to fastmdxplora)
```

This gives a fully working analysis + report pipeline, slide deck included (`python-pptx` is a core dependency). The setup and simulation phases emit a clear warning and skip gracefully until the chemistry stack is present. Add it via conda-forge (recommended, reliable across platforms):

```bash
conda install -c conda-forge pdbfixer openmm
```

or best-effort via the `[md]` pip extras (PDBFixer wheels are unavailable on some platforms, so conda is preferred):

```bash
pip install "fastmdxplora[md]"
```

### Development install

```bash
git clone https://github.com/aai-research-lab/FastMDXplora.git
cd FastMDXplora
mamba env create -f environment.yml || conda env create -f environment.yml
conda activate fastmdxplora
pip install -e ".[test]"               # editable, with the test dependencies
```

### Verify

```bash
fastmdx --version
fastmdx info                           # versions + detected backends (OpenMM/PDBFixer)
```

Check which OpenMM platforms are available (CPU/CUDA/OpenCL):

```bash
python - <<'PY'
import openmm as mm
plats = [mm.Platform.getPlatform(i).getName() for i in range(mm.Platform.getNumPlatforms())]
print("Available platforms:", plats)
print("CUDA available" if "CUDA" in plats else "CPU-only — simulations will run on CPU")
PY
```

> **conda-forge package (coming soon).** A single-command `conda install -c conda-forge fastmdxplora` (pulling every dependency, all four phases working out of the box) is planned once the recipe clears review. Until then, use the git + `environment.yml` route above.

### Mamba / Miniforge (optional)

`mamba` is a drop-in, faster replacement for the conda solver — helpful because solving the OpenMM/CUDA stack is exactly where the classic solver is slow. If you don't have it, the easiest source is **Miniforge** (conda + mamba, preconfigured for conda-forge):

```bash
# Linux (x86_64) — see https://conda-forge.org/miniforge/ for macOS/Windows/ARM
curl -L -o "$HOME/Miniforge3.sh" \
  "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh"
bash "$HOME/Miniforge3.sh" -b -p "$HOME/miniforge3"
source "$HOME/miniforge3/etc/profile.d/conda.sh"
conda init "$(basename "$SHELL")"
```

If `mamba` still isn't on PATH afterward, add it to the base environment:

```bash
conda install -n base -c conda-forge mamba
```

For other operating systems (macOS Intel/Apple Silicon, Linux ARM64, Windows), grab the matching installer from the [Miniforge releases page](https://conda-forge.org/miniforge/).

## Examples

### Command line

**Run the full pipeline** (setup → simulate → analyze → report):
```bash
fastmdx explore --system protein.pdb
```
**Fetch a structure from the PDB by ID** (auto-detected, fetched from RCSB):
```bash
fastmdx explore --system 1L2Y
```
**Tune per-phase options** (flags are namespaced by phase):
```bash
fastmdx explore -s protein.pdb --setup-ph 7.4 --simulate-duration-ns 100 --simulate-platform CUDA
```
**Run only specific phases**:
```bash
fastmdx explore -s protein.pdb --include setup simulation
```
**Run a single phase** (bare flags, no phase prefix):
```bash
fastmdx setup -s protein.pdb --ph 6.5
fastmdx simulate --output run_001 --duration-ns 50 --platform CUDA
fastmdx analyze --output run_001 --analyses rmsd rmsf rg
```
**Drive a whole study from a config file** (`-c` and `-config` also work):
```bash
fastmdx explore --config study.yml
```
**Generate a commented config template to edit**:
```bash
fastmdx init-config -o study.yml
```

The `-s`, `-system`, and `--system` forms are equivalent; `xplore` is an alias of `explore`.

### Python API

**Run the full pipeline**:
```python
from fastmdxplora import FastMDXplora

fmdx = FastMDXplora(system="protein.pdb")
fmdx.explore()
```
**Specify options and select phases**:
```python
fmdx = FastMDXplora(system="1L2Y")          # PDB ID, fetched from RCSB
results = fmdx.explore(
    include=["setup", "simulation", "analysis"],
    options={
        "simulation": {"duration_ns": 100, "temperature_K": 310, "platform": "CUDA"},
        "analysis":   {"include": ["rmsd", "rg", "cluster"]},
    },
)
# explore() always returns a list of runs (a single study is a list of one)
for run in results:
    print(run.run_id, run.status)
    for phase in run.phases:
        print("  ", phase.name, phase.status)
```
**Run a config file** — one system, many systems, or a parameter sweep, all the same way:
```python
fmdx = FastMDXplora(config="study.yml")
fmdx.explore()
```
**Preview a run without executing** (CLI `--dry-run`, or `dry_run=True`):
```python
FastMDXplora(config="campaign.yml").explore(dry_run=True)
```

> Recommended alias: `import fastmdxplora as fastmdx`.

See [Configuration files](#configuration-files) and [Many systems and parameter sweeps](#many-systems-and-parameter-sweeps) for the YAML format, batches, sweeps, and parallel execution.

## Configuration files

For anything beyond a quick run, capture the whole study in a single YAML file instead of a long flag list. The same file drives both the CLI and the Python API. Input is always given as a `systems:` list — even for a single system — so the file looks the same whether you study one protein or a dozen.

Generate a commented template to start from:

```bash
fastmdx init-config                    # writes fastmdxplora.yml (comprehensive)
fastmdx init-config --minimal -o study.yml   # short starter
```

A `study.yml` looks like:

```yaml
systems:
  - id: protein1
    system: protein.pdb        # PDB/CIF path, 4-char PDB ID, or sequence

output: ./my_study
include: [setup, simulation, analysis, report]

setup:
  ph: 7.4
  ion_concentration_M: 0.15

simulation:
  duration_ns: 100.0         # production length (equilibration is separate)
  temperature_K: 310.0
  platform: CUDA

analysis:
  include: [rmsd, rmsf, rg, cluster]
  selection: "name CA"
  options:
    cluster:
      methods: [kmeans, hierarchical]
      n_clusters: 5

report:
  title: "My MD Study"
```

Run it from the CLI or the API:

```bash
fastmdx explore --config study.yml     # also: -c, -config
```

```python
from fastmdxplora import FastMDXplora
FastMDXplora(config="study.yml").explore()
```

With a single system and no sweep, the output uses the familiar flat layout (`my_study/setup/`, `my_study/simulation/`, …) with the usual `manifest.json` and `resolved_config.yml`. Three things make this robust:

- **Flags override the file.** `fastmdx explore --config study.yml --simulate-duration-ns 50` keeps everything in the file but runs 50 ns. Precedence is: command-line flags / API kwargs > config file > built-in defaults.
- **Strict validation.** A typo like `pH:` (wrong case) or `simulaton:` is rejected with a did-you-mean suggestion, so a misspelled key never silently runs with the default.
- **Reproducibility.** Every run writes `resolved_config.yml` — the fully-merged configuration that actually ran (defaults + file + overrides). Feed it straight back to `--config` to reproduce the study exactly.

For a quick command-line one-off, `-s/--system` is shorthand that builds a one-element `systems` list for you:

```bash
fastmdx explore -s protein.pdb --simulate-duration-ns 50
```

## Many systems and parameter sweeps

Because input is always a `systems:` list, studying several systems is just adding entries. Add a `sweep:` block to vary parameters, and FastMDXplora runs the full cross-product — each as a complete, self-contained study.

```yaml
output: ./trpcage_campaign
include: [setup, simulation, analysis, report]

systems:
  - id: trpcage1
    system: trpcage.pdb
  - id: trpcage2
    system: trpcage.pdb
    setup: { ph: 6.5 }                 # optional per-system overrides

sweep:
  simulation.temperature_K: [300, 310, 320]   # dotted phase.option → values
  simulation.pressure_bar: [1.0, 1.2]          # multiple axes → cross-product
```

That config produces 2 systems × 3 temperatures × 2 pressures = **12 runs**. When there is more than one run, each goes in its own `runs/<id>/` subdirectory, indexed by a top-level `batch_manifest.json`, with a cross-run `comparison/` report:

```
trpcage_campaign/
  batch_manifest.json
  comparison/                                        (cross-run report)
  runs/
    trpcage1__temperature_K-300__pressure_bar-1.0/   (a full study)
    trpcage1__temperature_K-300__pressure_bar-1.2/
    ...
```

Run it exactly as any other config:

```bash
fastmdx explore --config campaign.yml
```

```python
from fastmdxplora import FastMDXplora
FastMDXplora(config="campaign.yml").explore()
```

Each run is identical in structure to a single study (its own `manifest.json`, `resolved_config.yml`, and phase directories), so existing analysis tooling works per-run unchanged. Option precedence within a run is base config < per-system overrides < swept value. Typo'd sweep axes are rejected with the valid-option list, and a failed run is recorded while the others continue.

### Cross-run comparison report

After a multi-run study, FastMDXplora automatically builds a `comparison/` report at the batch root that turns a directory of runs into a single analysis:

- **Overlays** — every run's per-frame trace (RMSD, Rg, Q-value, total SASA) drawn on one set of axes, labelled by its swept value, so divergence across the sweep is visible at a glance.
- **Trends** — each run reduced to a summary scalar (e.g. mean RMSD over the trajectory) and plotted against the swept parameter, giving a structure-property relationship.
- **`comparison_summary.csv`** — one row per run with the summary scalars, ready for further analysis.
- **`comparison_report.md`** — a written report tying the figures together, with a one-line quantitative takeaway per property (e.g. *"across temperature_K 300 → 320, mean RMSD increases 0.21 → 0.23 nm"*).

It degrades gracefully (errored runs and missing analyses are skipped) and can be turned off with `report: { comparison: false }`.

### Parallel execution

By default runs execute sequentially. An optional `execution:` block runs several at once:

```yaml
execution:
  mode: parallel          # sequential (default) | parallel
  workers: 2              # how many runs at once
  devices: [0, 1]         # GPU indices — one run pinned per device
  continue_on_error: true
```

Parallelism is process-based (each run is a subprocess, required because OpenMM contexts and the GIL don't share across threads). On GPU, the safe pattern is **one run per GPU**: list your `devices` and each worker is pinned to a distinct index round-robin. Oversubscribing a single GPU is slower than running sequentially, so `workers` should not exceed the number of devices on GPU. When `workers` is unset it defaults to one per device (GPU) or the CPU count capped at the run count (CPU).

## The four phases

| Phase | Purpose | Key outputs |
|---|---|---|
| `setup` | System preparation (fix, protonate, solvate, ionize) | `prepared.pdb`, `solvated.pdb`, `setup_parameters.json` |
| `simulation` | Minimize, NVT, NPT, production MD | `production.dcd`, `topology.pdb`, `simulation_parameters.json` |
| `analysis` | RMSD, RMSF, Rg, H-bonds, SS, cluster, SASA, dim-red, Q-value, dihedrals | `<analysis>/*.dat`, `<analysis>/*.png`, `analysis_manifest.json` |
| `report` | Slides, structured report, project bundle | `report.md`, `slides.pptx`, `project_bundle.zip` |

Each phase writes to a dedicated subdirectory under the project output root and produces a structured parameters manifest, so every artifact is traceable to the exact options that produced it.

## Documentation

Documentation is hosted at [fastmdxplora.readthedocs.io](https://fastmdxplora.readthedocs.io) (under development).

## Citation

If you use FastMDXplora in your work, please cite the foundational FastMDAnalysis paper:

> Aina, A.; Kwan, D. *FastMDAnalysis: Software for Automated Analysis of Molecular Dynamics Trajectories.* J. Comput. Chem. **2026**, 47, e70350. DOI: [10.1002/jcc.70350](https://doi.org/10.1002/jcc.70350)

```bibtex
@article{aina2026fastmd,
  author  = {Aina, Adekunle and Kwan, Derrick},
  title   = {FastMDAnalysis: Software for Automated Analysis of Molecular Dynamics Trajectories},
  journal = {Journal of Computational Chemistry},
  volume  = {47},
  number  = {8},
  pages   = {e70350},
  year    = {2026},
  doi     = {10.1002/jcc.70350},
}
```

## Contributing

Contributions are welcome. See [CONTRIBUTING.md](CONTRIBUTING.md). FastMDXplora follows the [Contributor Covenant](CODE_OF_CONDUCT.md).

## License

MIT — see [LICENSE](LICENSE).

## Acknowledgements

FastMDXplora is developed in the [AAI Research Lab](https://aai-research-lab.github.io) at California State University Dominguez Hills. It builds on a deep ecosystem of open-source scientific Python: MDTraj, OpenMM, PDBFixer, NumPy, SciPy, scikit-learn, Matplotlib, python-pptx, and many others.
