Metadata-Version: 2.4
Name: procustes
Version: 0.1.0
Summary: Add your description here
Requires-Python: <3.13,>3.11
Requires-Dist: mdanalysis>=2.10.0
Requires-Dist: pdbfixer>=1.12.0
Provides-Extra: boltz
Requires-Dist: boltz[cuda]; extra == 'boltz'
Requires-Dist: lightning[extra]; extra == 'boltz'
Provides-Extra: dev
Requires-Dist: fastmcp>=1.0.0; extra == 'dev'
Requires-Dist: mypy>=1.16.0; extra == 'dev'
Requires-Dist: pre-commit>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=8.4.0; extra == 'dev'
Requires-Dist: ruff>=0.9.0; extra == 'dev'
Provides-Extra: openmm
Requires-Dist: openmm; extra == 'openmm'
Description-Content-Type: text/markdown

## Procustes

`procustes` truncates a protein around a ligand using cutoff-based residue selection.

### CLI

```bash
procustes INPUT_STRUCTURE OUTPUT_DIR [options]
```

Required positional arguments:

- `INPUT_STRUCTURE`: input `.pdb` or `.cif` containing protein + ligand
- `OUTPUT_DIR`: base output directory where all outputs are written

Options:

- `--ligand` ligand residue name (default: `LIG`)
- `--cutoff` cutoff distance in angstrom (default: `4.0`)
- `--ca` use only alpha-carbon distances (default: use any residue atom)
- `--fill-gaplength` internal removed gaps shorter than this value are restored to original residues; gaps at or above it are considered for alanine-based filling (default: `4`)
- `--extra-residues` comma-separated extra protein residues to force-keep before gap logic (`RESID` for single-chain inputs, `CHAIN:RESID` for multi-chain; spaces/trailing commas are accepted)
- `--nofill` disable long-gap filling
- `--caps` add ACE/NME caps to all no-fill biopolymer chains (requires `--nofill`)
- `--fill-method` filling backend: `pdbfixer` or `boltz` (default: `pdbfixer`)
- `--fill-models-count` number of Boltz fill candidates per cutoff (default: `3`, max: `20`)
- `--aa-length` residue spacing used to estimate minimum bridge alanines from terminal CA distance (default: `4.0`)
- `--boltz-cache` optional Boltz cache path
- `--boltz-diffusion-samples` diffusion samples passed to `boltz predict` (default: `1`)
- `--boltz-devices` device count passed to `boltz predict` (default: `1`)
- `--boltz-accelerator` accelerator passed to `boltz predict`: `cpu`, `gpu`, `tpu` (default: `gpu`)
- `--boltz-use-msa-server` pass `--use_msa_server` to `boltz predict`
- `--no-boltz-potentials` disable Boltz `--use_potentials` (enabled by default)
- `--boltz-template-threshold` template force threshold written in Boltz YAML (default: `0.1`)
- `--color` colorized progress output mode: `auto` (default), `always`, `never`
- `--quiet` disable progress output

Outputs are written directly under `OUTPUT_DIR`:

```text
OUTPUT_DIR/
  _boltz/
    cutoff_<cutoff>_template.pdb
    cutoff_<cutoff>_<candidate>/
      <job>.yaml
      predictions/...
  a<cutoff>truncated.pdb
  b<cutoff>truncated.pdb
  ...
  <cutoff>truncated.pdb
  summary.json
```

`OUTPUT_DIR` is created if missing, but only when its parent directory already exists.
If the parent path does not exist, `procustes` fails with an error.

`summary.json` is written once per run and includes run parameters (including
`extra_residues_requested`) plus a `cutoffs` array (single entry) with residue
counts, candidate scores, winning candidate metadata, and
`extra_residues_applied`.

When `--nofill` is set, Boltz is skipped and only `<cutoff>truncated.pdb` is written.

When both `--nofill --caps` are set, every resulting protein chain is capped with
`ACE` and `NME`, chain IDs are reassigned deterministically starting at `A`, and
small-molecule binder chain IDs are reassigned from `X` to avoid collisions.

If `--nofill` is set, custom fill arguments (`--fill-models-count`, `--aa-length`,
or any `--boltz-*` option) raise an error.

If `--caps` is set without `--nofill`, `procustes` raises an error.

If `--fill-method pdbfixer` is selected, any `--boltz-*` options raise an error.

Final output normalization is always applied to `<cutoff>truncated.pdb`:
ligand/small-molecule residues are written first, small-molecule chain IDs are
assigned from `X`, and biopolymer chains are assigned from `A` to avoid chain-ID
collisions.

During CLI execution, `procustes` prints per-cutoff stage logs (residue selection, detected gap ranges/lengths, Boltz command invocation, candidate scores) plus final summaries with kept residues, alanine-filled residues, elapsed time, and output file path.

For Boltz fill runs (`--fill-method boltz`), each candidate YAML includes a `templates` entry pointing to
`OUTPUT_DIR/_boltz/cutoff_<cutoff>_template.pdb` (protein after short-gap restoration),
with `chain_id`, `template_id`, `force: true`, and `threshold` so Boltz can enforce
template guidance while modeling alanine bridge regions.

After each Boltz candidate model is generated, `procustes` aligns it to the cutoff
template with MDAnalysis using only non-inserted residues (the original kept residues,
excluding alanine bridge insertions), then grafts template coordinates for those
non-gap residues before merging ligand atoms.

### Integration reference workflow

The TYK2 end-to-end integration suite lives in
`tests/integration/test_tyk2_end_to_end.py` and validates four compressed fixtures
(`ejm31`, `ejm42`, `jmc27`, `jmc28`) by running the full CLI entrypoint in-process.

Reference artifacts are stored under `tests/reference/<complex>/` as:

- `9truncated.pdb` (byte-for-byte comparison after stripping hydrogen records, to
  avoid OpenMM/PDBFixer hydrogen-placement nondeterminism)
- `summary.json` (field-aware JSON comparison)

To regenerate these references intentionally (one-time baseline refresh), run:

```bash
uv run --extra dev python scripts/generate_tyk2_references.py
```

By default, integration test temporary directories are deleted. Set
`PROCUSTES_KEEP_ITEST_TMP=1` to retain them for debugging.

## Development

Use the project dev environment with `uv`:

```bash
uv sync --extra dev
```

Run formatting and linting:

```bash
uv run --extra dev ruff format src tests scripts
uv run --extra dev ruff check src tests scripts
```

Run tests:

```bash
uv run --extra dev pytest -q
```

Run only the TYK2 integration tests:

```bash
cd tests/integration && pytest -q
```

## PyPI Release

Tag-based releases use `hatch-vcs` dynamic versioning and upload wheel-only artifacts.

Prerequisites:

- clean git working tree
- local branch fully synced with upstream
- `~/.pypirc` configured for `[pypi]` credentials
- `git` and `uv` available on `PATH`

Run:

```bash
python scripts/release_pypi.py X.Y.Z
```

The release script will:

1. validate `X.Y.Z` format
2. verify git cleanliness and upstream sync
3. ensure tag does not already exist locally/remotely
4. create annotated tag `X.Y.Z`
5. build exactly one wheel into `dist/` (no `sdist`)
6. upload only that wheel via `twine` using `~/.pypirc` `pypi` section
7. push the release tag to `origin`
