Basic Protein Simulation¶
This tutorial walks through running a complete molecular dynamics simulation of a protein, from structure preparation to trajectory analysis.
Prerequisites¶
A protein structure in PDB format
AmberTools installed and accessible
OpenMM with GPU support (recommended)
We’ll use lysozyme (PDB: 1AKI) as our example system.
Step 1: Build the System¶
Create an explicitly solvated system with the ff19SB force field:
from molecular_simulations.build import ExplicitSolvent
pdb_file = Path('/path/to/1AKI.pdb')
output_dir = Path("./lysozyme_sim")
builder = ExplicitSolvent(
out=output_dir,
pdb=pdb_file,
# Optional parameters:
# padding=12.0, # Angstroms of water padding
# amberhome='/path/to/env', # Use another env's AmberTools installation
)
builder.build()
This creates:
system.prmtop- AMBER topology with force field parameterssystem.inpcrd- Initial coordinates with solvent and ions
Note
The builder uses tleap internally. Check the generated tleap.log file
if you encounter issues with unusual residues or missing parameters.
Step 2: Run the Simulation¶
Initialize and run the simulation with default settings:
from molecular_simulations.simulate import Simulator
sim = Simulator(
path=output_dir,
equil_steps=500_000, # 1 ns equilibration
prod_steps=5_000_000, # 20 ns production
)
sim.run()
The simulator automatically:
Minimizes the system (steepest descent)
Heats from 0 K to 300 K with position restraints
Equilibrates in NPT ensemble
Runs production dynamics with 4 fs timestep (hydrogen mass repartitioning)
Output files:
prod.dcd- Production trajectoryprod.log- Energy, temperature, and density data
Customizing Simulation Parameters¶
For more control over the simulation:
sim = Simulator(
path=output_dir,
equil_steps=2_500_000, # 5 ns equilibration total (unrestrained + restrained)
prod_steps=25_000_000, # 100 ns production @ 4 fs timestep
temperature=310, # K (body temperature)
n_equil_cycles=3, # N cycles of unrestrained equilibration
prod_reporter_frequency=5000, # Save every 20 ps @ 4 fs timestep
platform="CUDA", # Force GPU platform
)
Step 3: Basic Analysis¶
Check the trajectory looks reasonable with basic metrics:
import MDAnalysis as mda
from MDAnalysis.analysis import rms
# Load the trajectory
u = mda.Universe(
str(output_dir / "system.prmtop"),
str(output_dir / "prod.dcd")
)
# Calculate RMSD relative to first frame
protein = u.select_atoms("protein and name CA")
R = rms.RMSD(protein, ref_frame=0).run()
print(f"Trajectory: {len(u.trajectory)} frames")
print(f"Final RMSD: {R.results.rmsd[-1, 2]:.2f} Å")
Step 4: Interaction Fingerprinting¶
Calculate per-residue interaction energies to identify key contacts:
from molecular_simulations.analysis import Fingerprinter
fp = Fingerprinter(
topology=str(output_dir / "system.prmtop"),
trajectory=str(output_dir / "prod.dcd"),
target_selection="protein", # Entire protein
)
fp.run()
fp.save() # Creates fingerprint.npz
# Load and examine results
import numpy as np
data = np.load("fingerprint.npz")
print(f"Residues analyzed: {data['residues'].shape[0]}")
Step 5: Clustering Conformations¶
Identify distinct conformational states:
from molecular_simulations.analysis import AutoKMeans
clusterer = AutoKMeans(
data_directory=str(output_dir), # Directory with .npy feature files
max_clusters=5,
reduction_algorithm="PCA",
reduction_kws={"n_components": 2},
)
clusterer.run()
clusterer.save_labels()
clusterer.save_centers()
print(f"Optimal clusters: {clusterer.n_clusters_}")
Troubleshooting¶
- Simulation crashes immediately
Check that OpenMM can access the GPU:
python -c "import openmm; print(openmm.Platform.getNumPlatforms())"- System has bad contacts after building
The initial minimization should resolve this. If it persists, check the input PDB for unusual conformations or clashing atoms.
- Trajectory file is empty or truncated
Check
prod.logfor errors. Common issues include disk space and unstable simulations (check temperature/energy for spikes).
Next Steps¶
Protein-Ligand Systems - Add small molecules to your simulations
HPC Deployment with Parsl - Run longer simulations on clusters
Analysis Workflows - More sophisticated analysis pipelines