Quickstart

This guide walks through the core workflow: building a molecular system, running a simulation, and analyzing the results.

Building a Solvated System

The ExplicitSolvent class creates AMBER-compatible topology and coordinate files from a PDB structure:

from molecular_simulations.build import ExplicitSolvent
from pathlib import Path

pdb_file = Path("/path/to/protein.pdb")
output_dir = Path("/path/to/simulation/inputs")

builder = ExplicitSolvent(output_dir, pdb_file)
builder.build()

# Outputs:
# - system.prmtop (topology)
# - system.inpcrd (coordinates)

The builder uses ff19SB for proteins and OPC water by default. For systems with non-standard residues or small molecules, see Protein-Ligand Systems.

Running a Simulation

The Simulator class handles equilibration and production MD with OpenMM:

from molecular_simulations.simulate import Simulator

sim = Simulator(
    path=output_dir,
    equil_steps=500_000,    # 1 ns equilibration (2 fs timestep)
    prod_steps=25_000_000,  # 100 ns production (4 fs timestep)
)
sim.run()

# Outputs:
# - prod.dcd (trajectory)
# - prod.log (energies, temperature, etc.)

Running on HPC Clusters

For running multiple replicas on a cluster with PBS either submit a normal job and call python script.py using the LocalSettings or a few prepared Settings configs are provided for ALCF Polaris and ALCF Aurora:

import parsl
from molecular_simulations.simulate import AuroraSettings

settings = AuroraSettings.from_yaml("parsl_config.yaml")
config = settings.config_factory("/path/to/run_dir")
parsl.load(config)

@parsl.python_app
def run_md(path, steps=25_000_000):
    from molecular_simulations.simulate import Simulator
    Simulator(path, prod_steps=steps).run()

# Submit jobs for each replica
futures = [run_md(p) for p in Path("./").glob("replica_*")]
results = [f.result() for f in futures]

The above code, if executed on a login node of ALCF Aurora would run each simulation until completion. This includes resubmitting if walltime is hit until each job is finished. For more information about Parsl, please read the Parsl Documentation.

Analyzing Trajectories

Interaction Energy Fingerprinting

Calculate per-residue electrostatic and van der Waals interaction energies:

from molecular_simulations.analysis import Fingerprinter

fp = Fingerprinter(
    topology="/path/to/system.prmtop",
    trajectory="/path/to/prod.dcd",
    target_selection="segid A",      # Receptor
    binder_selection="segid B",      # Ligand/binder
)
fp.run()
fp.save()  # Saves fingerprint.npz

Automatic Clustering

Cluster simulation frames using KMeans++ with automatic k selection:

from molecular_simulations.analysis import AutoKMeans

clusterer = AutoKMeans(
    data_directory="/path/to/features/",
    max_clusters=10,
    reduction_algorithm="PCA",
    reduction_kws={"n_components": 2},
)
clusterer.run()
clusterer.save_labels()
clusterer.save_centers()

The AutoKMeans is a great first-pass clustering and is sometimes sufficient without relying on testing various clustering methods. If you find that it is doing a poor job for your data, it may require custom clustering analyses.

Protein-Protein Interactions

Analyze hydrogen bonds, salt bridges, and hydrophobic contacts:

from molecular_simulations.analysis import PPInteractions

ppi = PPInteractions(
    top="/path/to/topology.prmtop",
    traj="/path/to/trajectory.dcd",
    out="/path/to/outputs/",
    sel1="chainID A",
    sel2="chainID B",
)
ppi.run()

MM-PBSA Binding Free Energy

Calculate binding free energies with MM-PBSA:

from molecular_simulations.simulate.mmpbsa import MMPBSA

mmpbsa = MMPBSA(
    top="/path/to/system.prmtop",
    dcd="/path/to/prod.dcd",
    selections=[":1-100", ":101-200"],  # receptor, ligand
    n_cpus=4,
    amberhome="/path/to/amber",
    parallel_mode="frame",
)
results = mmpbsa.run()

The amberhome kwarg is not necessary but is provided as way for virtual environments, such as those created with uv, can still take advantage of this code by providing a path to another location which has ambertools installed (which cannot be done via uv/pip).

Interface Scoring

Score protein-protein interfaces using AlphaFold metrics:

from molecular_simulations.analysis import ipSAE

scorer = ipSAE(
    structure_file="/path/to/complex.pdb",
    pae_file="/path/to/pae.json",
    plddt_file="/path/to/plddt.json",
)
scores = scorer.run()
# Returns: ipTM, ipSAE, pDockQ, pDockQ2

Next Steps