Metadata-Version: 2.4
Name: isograph
Version: 0.1.0
Summary: A Python framework for discovering isoform-switch and splicing modules from bulk RNA-seq by combining gene-local compositional modeling with splice-graph-aware latent network inference.
License: Apache License
                                    Version 2.0, January 2004
                                 http://www.apache.org/licenses/
         
            TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
         
            1. Definitions.
         
               "License" shall mean the terms and conditions for use, reproduction,
               and distribution as defined by Sections 1 through 9 of this document.
         
               "Licensor" shall mean the copyright owner or entity authorized by
               the copyright owner that is granting the License.
         
               "Legal Entity" shall mean the union of the acting entity and all
               other entities that control, are controlled by, or are under common
               control with that entity. For the purposes of this definition,
               "control" means (i) the power, direct or indirect, to cause the
               direction or management of such entity, whether by contract or
               otherwise, or (ii) ownership of fifty percent (50%) or more of the
               outstanding shares, or (iii) beneficial ownership of such entity.
         
               "You" (or "Your") shall mean an individual or Legal Entity
               exercising permissions granted by this License.
         
               "Source" form shall mean the preferred form for making modifications,
               including but not limited to software source code, documentation
               source, and configuration files.
         
               "Object" form shall mean any form resulting from mechanical
               transformation or translation of a Source form, including but
               not limited to compiled object code, generated documentation,
               and conversions to other media types.
         
               "Work" shall mean the work of authorship made available under
               the License, as indicated by a copyright notice that is included in
               or attached to the work (an example is provided in the Appendix below).
         
               "Derivative Works" shall mean any work, whether in Source or Object
               form, that is based on (or derived from) the Work and for which the
               editorial revisions, annotations, elaborations, or other modifications
               represent, as a whole, an original work of authorship. For the purposes
               of this License, Derivative Works shall not include works that remain
               separable from, or merely link (or bind by name) to the interfaces of,
               the Work and Derivative Works thereof.
         
               "Contribution" shall mean, as submitted to the Licensor for inclusion
               in the Work by the copyright owner or by an individual or Legal Entity
               authorized to submit on behalf to the Licensor, the copyright owner.
         
               "Contributor" shall mean Licensor and any Legal Entity on behalf of
               whom a Contribution has been received by the Licensor and included
               within the Work.
         
            2. Grant of Copyright License. Subject to the terms and conditions of
               this License, each Contributor hereby grants to You a perpetual,
               worldwide, non-exclusive, no-charge, royalty-free, irrevocable
               copyright license to reproduce, prepare Derivative Works of,
               publicly display, publicly perform, sublicense, and distribute the
               Work and such Derivative Works in Source or Object form.
         
            3. Grant of Patent License. Subject to the terms and conditions of
               this License, each Contributor hereby grants to You a perpetual,
               worldwide, non-exclusive, no-charge, royalty-free, irrevocable
               (except as stated in this section) patent license to make, have made,
               use, offer to sell, sell, import, and otherwise transfer the Work,
               where such license applies only to those patent claims licensable
               by such Contributor that are necessarily infringed by their
               Contribution(s) alone or by the combination of their Contributions
               with the Work to which such Contributions were submitted. If You
               institute patent litigation against any entity (including a
               cross-claim or counterclaim in a lawsuit) alleging that the Work
               or a Contribution incorporated within the Work constitutes direct
               or contributory patent infringement, then any patent licenses
               granted to You under this License for that Work shall terminate
               as of the date such litigation is filed.
         
            4. Redistribution. You may reproduce and distribute copies of the
               Work or Derivative Works thereof in any medium, with or without
               modifications, and in Source or Object form, provided that You
               meet the following conditions:
         
               (a) You must give any other recipients of the Work or Derivative
                   Works a copy of this License; and
         
               (b) You must cause any modified files to carry prominent notices
                   stating that You changed the files; and
         
               (c) You must retain, in the Source form of any Derivative Works
                   that You distribute, all copyright, patent, trademark, and
                   attribution notices from the Source form of the Work,
                   excluding those notices that do not pertain to any part of
                   the Derivative Works; and
         
               (d) If the Work includes a "NOTICE" text file, as part of the
                   distribution, You must include a readable copy of the
                   attribution notices contained within such NOTICE file, in
                   at least one of the following places: within a NOTICE text
                   file distributed as part of the Derivative Works; within
                   the Source form or documentation, if provided along with the
                   Derivative Works; or, within a display generated by the
                   Derivative Works, if and wherever such third-party notices
                   normally appear. The contents of the NOTICE file are for
                   informational purposes only and do not modify the License.
                   You may add Your own attribution notices within Derivative
                   Works that You distribute, alongside or in addition to the
                   NOTICE text from the Work, provided that such additional
                   attribution notices cannot be construed as modifying the License.
         
               You may add Your own license statement for Your modifications and
               may provide additional grant of rights to use, copy, modify, merge,
               publish, distribute, sublicense, and/or sell copies of the
               Contribution.
         
            5. Submission of Contributions. Unless You explicitly state otherwise,
               any Contribution intentionally submitted for inclusion in the Work
               by You to the Licensor shall be under the terms and conditions of
               this License, without any additional terms or conditions.
               Notwithstanding the above, nothing herein shall supersede or modify
               the terms of any separate license agreement you may have executed
               with Licensor regarding such Contributions.
         
            6. Trademarks. This License does not grant permission to use the trade
               names, trademarks, service marks, or product names of the Licensor,
               except as required for reasonable and customary use in describing the
               origin of the Work and reproducing the content of the NOTICE file.
         
            7. Disclaimer of Warranty. Unless required by applicable law or
               agreed to in writing, Licensor provides the Work (and each
               Contributor provides its Contributions) on an "AS IS" BASIS,
               WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
               implied, including, without limitation, any conditions of TITLE,
               NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR
               PURPOSE. You are solely responsible for determining the
               appropriateness of using or reproducing the Work and assume any
               risks associated with Your exercise of permissions under this License.
         
            8. Limitation of Liability. In no event and under no legal theory,
               whether in tort (including negligence), contract, or otherwise,
               unless required by applicable law (such as deliberate and grossly
               negligent acts) or agreed to in writing, shall any Contributor be
               liable to You for damages, including any direct, indirect, special,
               incidental, or exemplary damages of any character arising as a
               result of this License or out of the use or inability to use the
               Work (even if such Contributor has been advised of the possibility
               of such damages).
         
            9. Accepting Warranty or Additional Liability. While redistributing
               the Work or Derivative Works thereof, You may choose to offer,
               and charge a fee for, acceptance of support, warranty, indemnity,
               or other liability obligations and/or rights consistent with this
               License. However, in accepting such obligations, You may offer only
               conditions consistent with this License.
         
            END OF TERMS AND CONDITIONS
         
            Copyright 2024-2026 Kynon J Benjamin
         
            Licensed under the Apache License, Version 2.0 (the "License");
            you may not use this file except in compliance with the License.
            You may obtain a copy of the License at
         
                http://www.apache.org/licenses/LICENSE-2.0
         
            Unless required by applicable law or agreed to in writing, software
            distributed under the License is distributed on an "AS IS" BASIS,
            WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
            See the License for the specific language governing permissions and
            limitations under the License.
License-File: LICENSE
License-File: LICENSE-DATA
Keywords: RNA-seq,isoform switching,splicing,co-expression,network inference,bioinformatics,transcriptomics
Author: Kynon J Benjamin
Author-email: kj.benjamin90@gmail.com
Requires-Python: >=3.11,<3.15
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Provides-Extra: dev
Provides-Extra: docs
Provides-Extra: torch
Requires-Dist: PyYAML (>=6,<7)
Requires-Dist: hydra-core (>=1.3,<2)
Requires-Dist: hypothesis (>=6.112,<7) ; extra == "dev"
Requires-Dist: mlflow-skinny (>=2.16,<3) ; extra == "dev"
Requires-Dist: myst-parser (>=3,<4) ; extra == "docs"
Requires-Dist: networkx (>=3.3,<4)
Requires-Dist: numpy (>=1.26,<3)
Requires-Dist: pandas (>=2.2,<3)
Requires-Dist: pyarrow (>=17,<25)
Requires-Dist: pydantic (>=2.8,<3)
Requires-Dist: pytest (>=8.3,<9) ; extra == "dev"
Requires-Dist: scikit-learn (>=1.5,<2)
Requires-Dist: scipy (>=1.13,<2)
Requires-Dist: sphinx (>=7,<8) ; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints (>=2,<3) ; extra == "docs"
Requires-Dist: sphinx-rtd-theme (>=3,<4) ; extra == "docs"
Requires-Dist: torch (>=2.0) ; extra == "torch"
Project-URL: Bug Tracker, https://github.com/heart-gen/IsoGraph/issues
Project-URL: Documentation, https://isograph.readthedocs.io/en/latest/
Project-URL: Homepage, https://github.com/heart-gen/IsoGraph
Project-URL: Repository, https://github.com/heart-gen/IsoGraph
Description-Content-Type: text/markdown

# IsoGraph

IsoGraph is a Python research software package for discovering isoform-switch and splicing
modules from bulk RNA-seq. It combines gene-local compositional modeling with network
inference so researchers can move from transcript-level counts to gene-module structure,
trait associations, and reproducible benchmark artifacts.

## Status

IsoGraph currently includes completed development stages 0 through 7:

- Stage 0: package, CLI, config validation, fixtures, and reproducibility infrastructure
- Stage 1: deterministic baseline network backend
- Stage 2: latent probabilistic backend (sklearn FA + partial correlation) with stability selection
- Stage 3: graph-aware backend
- Stage 4: VAE backend (default production backend)
- Stage 5: WGCNA comparison benchmark on simulated data
- Stage 6: large-scale fixtures (6k–12k genes) and VAE architecture scaling
- Stage 7: GPU-accelerated FA backend (Woodbury identity + BIC component selection)

## Core Capabilities

- Generate and benchmark against the permanent `core_v1` fixture suite and the large-scale
  `scale_v1` suite (6k–12k genes, 25:1–50:1 genes-to-samples ratios).
- Freeze the bundled `real_caudate_aa_v1` real-data fixture from local BrainSeq inputs.
- Fit the deterministic baseline backend from the command line on a prepared dataset bundle.
- Run `baseline`, `latent`, `graph`, `vae`, `wgcna`, or `gpu_latent` backends programmatically
  or through the benchmark runner.
- Export reproducible artifacts, benchmark reports, calibration summaries, and snapshot comparisons.

## Installation

The repository ships with a conda environment that installs IsoGraph in editable mode:

```bash
conda env create -f environment.yml
conda activate isograph
isograph --help
```

If `conda` is not initialized in the current shell, run `eval "$(conda shell.bash hook)"`
first or initialize conda for your shell.

The core package supports Python `3.11` through `3.14`. The bundled environment uses
Python `3.11` as the canonical local development runtime.

## Quickstart

Run a minimal benchmark on the bundled toy fixture (VAE is the default backend):

```bash
conda activate isograph
isograph benchmark -- \
  fixture_filter=toy_v1 \
  stage_name=readme_smoke
```

This writes benchmark artifacts under `artifacts/benchmarks/readme_smoke/toy_v1/` and
JSON reports under `artifacts/reports/`.

## Using Your Own Data

IsoGraph expects a prepared dataset bundle containing a `manifest.json`, aligned sample
metadata, feature tables, and dense count matrices. The current command-line path for
custom data is:

```bash
isograph fit \
  --dataset-path path/to/my_dataset_bundle \
  --output-dir artifacts/fits/my_dataset
```

At present, `fit` runs the deterministic baseline backend. For latent, graph, or VAE
backends on your own bundle, use the Python API directly. The detailed walkthroughs live
in the Wiki, and the formal data model is documented in the RTD source tree.

## Documentation

- Reference docs for publication on Read the Docs live in [docs](docs/index.md).
- Step-by-step tutorials for installation, data preparation, and own-data workflows live
  in the [GitHub Wiki](https://github.com/heart-gen/IsoGraph/wiki).
- Project planning and staged development history remain in
  [docs/staged-roadmap.md](docs/staged-roadmap.md).

## Citation

If you use IsoGraph in research, cite the software repository using the metadata in
[CITATION.cff](CITATION.cff). If a manuscript or preprint becomes available later, that
can be added as a preferred citation target without changing the software citation path.

## Acknowledgements

IsoGraph is supported by the National Institute on Minority Health and Health Disparities
award `R00 MD0169640` and the Alzheimer's Association award `25AARG-1413315`.

## Reproducibility and Data Provenance

- The benchmark suite is fixture-driven and designed to preserve regression targets across
  development stages.
- The bundled real-data workflow freezes a reproducible `real_caudate_aa_v1` dataset from
  local BrainSeq-derived inputs and caches intermediate selections under
  `benchmarks/cache/real_data/`.
- Benchmark, calibration, runtime, and snapshot artifacts are written into versioned
  directories under `artifacts/` and `snapshots/`.

## Limitations

- The benchmark CLI is optimized for the bundled fixture suite rather than arbitrary
  user-defined suites.
- The `fit` CLI currently exposes only the baseline backend for custom datasets.
- The VAE and GPU-latent backends require a separate PyTorch installation.
- The WGCNA backend requires R with the `WGCNA` package installed.
- The bundled `freeze-real` workflow depends on local BrainSeq-style source files and is
  not a generic data-ingestion command for arbitrary cohorts.

