Metadata-Version: 2.4
Name: cophyloforge
Version: 0.1.2
Summary: CPU inference tools for CophyloForge cophylogeny scenario prediction
Author: CoPhyloForge contributors
License-Expression: MIT
Project-URL: Homepage, https://github.com/rajamosai/CoPhyloForge
Project-URL: Repository, https://github.com/rajamosai/CoPhyloForge
Project-URL: Model, https://doi.org/10.5281/zenodo.19656529
Keywords: cophylogeny,bioinformatics,phylogenetics,machine learning,symbiosis
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24
Requires-Dist: torch>=2.1
Provides-Extra: dev
Requires-Dist: build>=1.0; extra == "dev"
Requires-Dist: pytest>=7.4; extra == "dev"
Requires-Dist: twine>=4.0; extra == "dev"
Dynamic: license-file

# cophyloforge

`cophyloforge` is a CPU inference package for predicting cophylogeny scenario families from a host tree, a symbiont tree, and observed host-symbiont associations. It packages the end-user prediction workflow separately from the research simulator, dataset builder, and training code in this repository.

The default frozen model is archived on Zenodo:

<https://doi.org/10.5281/zenodo.19656529>

## Installation

```bash
pip install cophyloforge
```

The package uses PyTorch for CPU inference. A GPU is not required.

## Quickstart

```bash
cophyloforge download-model
cophyloforge predict \
  --host-tree host.nwk \
  --sym-tree sym.nwk \
  --associations assoc.tsv \
  --outdir results
```

The prediction command writes:

- `results/prediction.json`
- `results/prediction.tsv`
- `results/prediction_report.txt`

The JSON and TSV outputs include the package version, model name, model version, model DOI/source, timestamp, input filenames, scenario prediction, scenario probabilities, `tracking_score`, `switch_score`, `multi_host_fraction`, `difficulty_score`, and `recommended_abstain`.

## Input Files

`cophyloforge` expects:

- a host tree in Newick format
- a symbiont tree in Newick format
- an association table or matrix in TSV or CSV format

The association file describes the observed biological links between host tips and symbiont tips. It usually comes from your interaction data, a spreadsheet, field observations, museum or sequence metadata, or literature curation. Tip names in this file must match the labels in the two Newick trees.

Two public association formats are supported.

Edge-list TSV, with one observed link per row:

```text
host	symbiont
host_A	sym_1
host_B	sym_1
host_C	sym_2
```

Binary matrix TSV or CSV, with symbionts in the first column and host names in the remaining columns:

```text
symbiont	host_A	host_B	host_C
sym_1	1	1	0
sym_2	0	0	1
sym_3	0	0	0
```

Use `1`, `true`, or any nonzero/nonempty value for a present association. Use `0`, `false`, or a blank cell for no association.

Create a template from tree tip labels:

```bash
cophyloforge init-association-template \
  --host-tree host.nwk \
  --sym-tree sym.nwk \
  --format matrix \
  --out association_template.tsv
```

For edge lists, the template contains the required header. For matrices, it contains all symbiont rows and host columns with zero-filled cells.

Validate inputs before prediction:

```bash
cophyloforge validate-input \
  --host-tree host.nwk \
  --sym-tree sym.nwk \
  --associations assoc.tsv
```

## CLI Reference

```bash
cophyloforge --help
cophyloforge version
cophyloforge download-model --model default
cophyloforge validate-input --host-tree host.nwk --sym-tree sym.nwk --associations assoc.tsv
cophyloforge predict --host-tree host.nwk --sym-tree sym.nwk --associations assoc.tsv --outdir results
```

Batch prediction from a manifest:

```bash
cophyloforge batch-predict \
  --manifest cases.csv \
  --outdir batch-results
```

The manifest must contain `host_tree`, `sym_tree`, and `associations` columns. An optional `case_id` column is used for output folder names.

Batch prediction from folders:

```bash
cophyloforge batch-predict \
  --input-dir cases \
  --outdir batch-results
```

Each case folder can contain either `host.nwk`, `sym.nwk`, and `assoc.tsv`, or the research dataset layout `trees/host_obs_sampled.nwk`, `trees/sym_obs_sampled.nwk`, and `associations/assoc_obs.tsv`.

## Python API

```python
from cophyloforge import download_model, predict

download_model()

result = predict(
    host_tree="host.nwk",
    sym_tree="sym.nwk",
    associations="assoc.tsv",
    outdir="results",
)

print(result["scenario_prediction"])
print(result["scenario_probabilities"])
```

## Model Cache

Model artifacts are not embedded in the wheel. `cophyloforge download-model` downloads the default Zenodo checkpoint into the user cache directory:

- Linux: `~/.cache/cophyloforge`
- macOS: `~/Library/Caches/cophyloforge`
- Windows: `%LOCALAPPDATA%\cophyloforge`

Set `COPHYLOFORGE_CACHE_DIR` or pass `--cache-dir` to use a different cache location.

## Citation

If you use the software or model, cite the Zenodo model record:

CoPhyloForge frozen inference model. Zenodo. <https://doi.org/10.5281/zenodo.19656529>

Also cite this repository or release when appropriate for the software version used.

## Development

Create an environment and install the package in editable mode:

```bash
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"
pytest
```

Build and check distributions:

```bash
python -m build
python -m twine check dist/*
```

The installable package uses a `src/` layout and contains only inference-facing code. Research scripts, simulator code, generated datasets, training code, and checkpoint artifacts remain in the repository but are excluded from the package distribution.

For a manual-style guide covering end-user commands, developer checks, and release/publish workflow, see `README_MANUAL.md`.
