Metadata-Version: 2.4
Name: velotest
Version: 0.0.1
Summary: A hypothesis test for velocity embeddings.
Author-email: Sebastian Bischoff <sebastian.bischoff@uni-tuebingen.de>
License-Expression: MIT
Project-URL: Repository, https://github.com/mackelab/velocity-hypothesis-test/
Project-URL: Issues, https://github.com/mackelab/velocity-hypothesis-test/issues
Project-URL: Documentation, https://velocity-hypothesis-test.readthedocs.io/en/latest/index.html
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.9
Description-Content-Type: text/x-rst
License-File: LICENSE
Requires-Dist: torch>=2.1.2
Requires-Dist: numpy>=1.26.4
Requires-Dist: scikit-learn>=1.3.0
Requires-Dist: tqdm>=4.66.1
Requires-Dist: matplotlib>=3.8.2
Requires-Dist: pandas>=2.1.4
Requires-Dist: scvelo>=0.3.2
Provides-Extra: experiment
Requires-Dist: hydra-core>=1.3.2; extra == "experiment"
Provides-Extra: docs
Requires-Dist: sphinx>=7.4.7; extra == "docs"
Provides-Extra: dev
Requires-Dist: pytest>=8.3.5; extra == "dev"
Requires-Dist: parameterized>=0.9.0; extra == "dev"
Dynamic: license-file

.. image:: https://app.readthedocs.org/projects/velocity-hypothesis-test/badge/?version=latest&style=flat
   :target: https://velocity-hypothesis-test.readthedocs.io/en/latest/
.. image:: https://github.com/mackelab/velocity-hypothesis-test/actions/workflows/test.yml/badge.svg?branch=main
   :target: https://github.com/mackelab/velocity-hypothesis-test/actions/workflows/test.yml

Velotest is a hypothesis test for how well a 2D embedding of positional and velocity data represents
the original high dimensional data. It's purpose is to help practitioners using 2D embeddings
of single cell RNA sequencing data with RNA velocity decide which 2D velocity vectors are faithfully representing
the high-dimensional data.

Installation
------------------
You can simply install the package via pip:

.. code-block:: bash

   pip install velotest

If you want to change bits of the code, install it in editable mode:

.. code-block:: bash

   pip install -e "."

In both cases you'll need additional dependencies to build the docs, run tests, or reproduce the figures from the paper,
which you can install via the extras :code:`docs`, :code:`dev`, or :code:`experiment`, either separately or in combination.
For example, to install the docs extra, run :code:`pip install velotest[docs]`, or to install both the docs and dev extras,
run :code:`pip install velotest[docs,dev]`.
Similarly, if you installed in editable mode, you can run :code:`pip install -e ".[docs]"`.

Usage
----------------

If you have the embeddings and original data as individual arrays/tensor (see below for use with an `anndata` object),
you can use our general interface:

.. code-block:: python

   from velotest.hypothesis_testing import run_hypothesis_test

   uncorrected_p_values, h0_rejected, _, _, _ = run_hypothesis_test(high_d_position, high_d_velocity, low_d_position, low_d_velocity_position)

where low_d_velocity_position is the tip's position of the 2D velocity vector, NOT the velocity vector originating in low_d_position.


An application on single-cell sequencing data (runnable notebook: :code:`notebooks/demo.ipynb`) could look like this (following `scvelo's tutorial <https://scvelo.readthedocs.io/en/stable/VelocityBasics.html>`_):

.. code-block:: python

   from velotest.hypothesis_testing import run_hypothesis_test_on
   import scvelo

   adata = scvelo.datasets.pancreas()
   scvelo.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
   scvelo.pp.moments(adata, n_pcs=30, n_neighbors=30)

   # Compute velocity
   scvelo.tl.velocity(adata)

   # Compute 2D embedding of velocity vectors
   scvelo.tl.velocity_graph(adata)
   scvelo.pl.velocity_embedding(adata)

   # Run test
   uncorrected_p_values, h0_rejected, _ = run_hypothesis_test_on(adata)


For plotting, you can use the :code:`plotting` module. Have a look at :code:`notebooks/demo.ipynb` for an example.
Refer to `Read the Docs <https://velocity-hypothesis-test.readthedocs.io/en/latest/>`_ for a more detailed API documentation.


Details
--------------------
Next, we will briefly summarize how the test works, for details see our paper.
The tests tries to assess how well the 2D velocity vectors represent the high-dimensional velocity vectors.
We quantify this by computing the mean cosine similarity between the high-dimensional velocity vector and
the difference vectors to a set of neighbors in the high-dimensional space.
For a data point :math:`i`, we use the mean cosine similarity between the velocity :math:`v_i` and
the difference vector :math:`x_j-x_i` for all :math:`x_j` in a set of neighbors of :math:`\tilde{x}_i` as the test statistic.
This set of neighbors is chosen based on the points the velocity :math:`\tilde{v}_i` points to in 2D.
:math:`\tilde{v}_i` and :math:`\tilde{x}_i` are the 2D embeddings of :math:`v_i` and :math:`x_i`, respectively.

The null hypothesis is that the visualised 2D velocity vector is no more aligned with the high-dimensional velocity
than a visually distinct random 2D direction.
It is rejected if the number of random neighborhoods with a higher statistic as the statistic
from the velocity-based neighborhood exceeds the level we would expect for a certain significance level.

It was originally developed for the analysis of single cell RNA sequencing data,
but can be applied to any application with positional and velocity data.

Reproducing plots from paper
------------------------------
Make sure that you have the :code:`experiment` extra installed (see Installation section above).

Then, you can reproduce all figures by simply running :code:`make_all_figures.py` in the :code:`experiments` folder:

.. code-block:: bash

   cd experiments
   python make_all_figures.py --multirun=dataset=pancreas_stochastic,pancreas_dynamical,dentateyrus,bonemarrow,covid,gastrulation_erythroid,nystroem,developing_mouse_brain,organogenesis,veloviz

This will create a :code:`fig` folder in the :code:`experiments` folder with all figures based on the configuration in :code:`configs/`.
This uses hydra to manage the configurations, so you can also modify individual configurations using the command line
with :code:`python make_all_figures.py dataset=pancreas_stochastic dataset.number_neighbors_to_sample_from=300`.
