H-NS Filament Minimal Dataset¶
This folder contains a compact, package-friendly subset of the full filament MD dataset.
Purpose¶
The goal is to avoid shipping the full ~1.7 GB trajectory bundle while preserving deterministic filament construction behavior for the tutorial's non-random assembly path.
This minimal dataset is designed for:
Assembler.load_minimal_site_map(...)assembler.add_dimer(segment='minimal')
Contents¶
s1s1_start.pdbs2s2_start.pdbs1s1_extend.pdbs2s2_extend.pdbcomplex_frame_1.pdbmanifest.json
Size¶
- Total: ~
1.5M
How it works¶
Unlike the full dataset (many .xtc trajectories), this folder stores only a few selected source-state PDB frames:
- start-state source frames for
s1s1ands2s2 - extend-state source frames for
s1s1ands2s2 - one DBD-DNA complex frame
At runtime, Assembler.load_minimal_site_map(...):
- Loads these source PDBs
- Rebuilds segment/site maps using
SiteMapper - Selects the required site structures for start/extend assembly
- Uses
complex_frame_1.pdbforadd_dna(...)
So the segmentation logic is still performed in code (like full mode), but from minimal source inputs.
Generation¶
This dataset is generated from examples/data/filament_dataset with:
python examples/scripts/extract_minimal_filament_dataset.py --output-dir examples/data/filament_minimal
The selection metadata (trajectory ids, stride, source mapping) is stored in manifest.json.
Notes¶
segment='minimal'is intended for reproducible compact usage.segment='fixed'andsegment='random'remain tied to full trajectory-style site maps.- If you regenerate with a different
--dna-frame-idx, the output complex file name and manifest should stay consistent for your workflow.