
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/advanced/plot_sca_vs_mi.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_advanced_plot_sca_vs_mi.py>`
        to download the full example code. or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_advanced_plot_sca_vs_mi.py:


=============================
Mutual information versus SCA
=============================

In this example, we are comparing the results of the co-evolution analysis on
serine proteases using SCA and the mutual information.

.. GENERATED FROM PYTHON SOURCE LINES 11-12

Import necessary

.. GENERATED FROM PYTHON SOURCE LINES 12-23

.. code-block:: Python

    from cocoatree.datasets import load_S1A_serine_proteases
    from cocoatree.msa import filter_sequences

    from cocoatree.statistics.position import compute_conservation
    from cocoatree.statistics.pairwise import compute_sca_matrix
    from cocoatree.statistics.pairwise import compute_mutual_information_matrix

    import matplotlib.pyplot as plt

    import numpy as np








.. GENERATED FROM PYTHON SOURCE LINES 24-28

Load the dataset
----------------

We start by importing the dataset.

.. GENERATED FROM PYTHON SOURCE LINES 28-34

.. code-block:: Python


    serine_dataset = load_S1A_serine_proteases(paper='rivoire')
    seq_id = serine_dataset["sequence_ids"]
    sequences = serine_dataset["alignment"]
    n_pos, n_seq = len(sequences[0]), len(sequences)








.. GENERATED FROM PYTHON SOURCE LINES 35-39

Filtering of the multiple sequence alignment
--------------------------------------------

We are going to filter and clean the MSA

.. GENERATED FROM PYTHON SOURCE LINES 39-41

.. code-block:: Python

    seq_kept, seq_id_kept, pos_kept = filter_sequences(sequences, seq_id)








.. GENERATED FROM PYTHON SOURCE LINES 42-44

Compute the SCA matrix
----------------------

.. GENERATED FROM PYTHON SOURCE LINES 44-46

.. code-block:: Python

    SCA_matrix = compute_sca_matrix(seq_kept)





.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    computing weight of seq 1/1376  
    computing weight of seq 101/1376        
    computing weight of seq 201/1376        
    computing weight of seq 301/1376        
    computing weight of seq 401/1376        
    computing weight of seq 501/1376        
    computing weight of seq 601/1376        
    computing weight of seq 701/1376        
    computing weight of seq 801/1376        
    computing weight of seq 901/1376        
    computing weight of seq 1001/1376       
    computing weight of seq 1101/1376       
    computing weight of seq 1201/1376       
    computing weight of seq 1301/1376       



.. GENERATED FROM PYTHON SOURCE LINES 47-49

Compute the Mutual information matrix
-------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 49-52

.. code-block:: Python

    normalized_MI = compute_mutual_information_matrix(seq_kept)
    MI = compute_mutual_information_matrix(seq_kept, normalize=False)





.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    computing weight of seq 1/1376  
    computing weight of seq 101/1376        
    computing weight of seq 201/1376        
    computing weight of seq 301/1376        
    computing weight of seq 401/1376        
    computing weight of seq 501/1376        
    computing weight of seq 601/1376        
    computing weight of seq 701/1376        
    computing weight of seq 801/1376        
    computing weight of seq 901/1376        
    computing weight of seq 1001/1376       
    computing weight of seq 1101/1376       
    computing weight of seq 1201/1376       
    computing weight of seq 1301/1376       
    computing weight of seq 1/1376  
    computing weight of seq 101/1376        
    computing weight of seq 201/1376        
    computing weight of seq 301/1376        
    computing weight of seq 401/1376        
    computing weight of seq 501/1376        
    computing weight of seq 601/1376        
    computing weight of seq 701/1376        
    computing weight of seq 801/1376        
    computing weight of seq 901/1376        
    computing weight of seq 1001/1376       
    computing weight of seq 1101/1376       
    computing weight of seq 1201/1376       
    computing weight of seq 1301/1376       



.. GENERATED FROM PYTHON SOURCE LINES 53-55

Compare MI versus SCA
---------------------

.. GENERATED FROM PYTHON SOURCE LINES 55-68

.. code-block:: Python

    plt.figure(figsize=(12, 5))

    for ih, (heatmap, title) in \
        enumerate(zip([SCA_matrix, MI, normalized_MI],
                      ['SCA matrix', 'Mutual Information',
                       'Normalized Mutual Information'])):
        plt.subplot(1, 3, ih+1)
        plt.imshow(heatmap, cmap='inferno')
        plt.xlabel('residues', fontsize=10)
        plt.ylabel(None)
        plt.title('%s' % title)
        plt.colorbar(shrink=0.4)




.. image-sg:: /auto_examples/advanced/images/sphx_glr_plot_sca_vs_mi_001.png
   :alt: SCA matrix, Mutual Information, Normalized Mutual Information
   :srcset: /auto_examples/advanced/images/sphx_glr_plot_sca_vs_mi_001.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 69-71

Adapt heatmap scale
---------------------

.. GENERATED FROM PYTHON SOURCE LINES 71-85

.. code-block:: Python

    plt.figure(figsize=(12, 5))

    for ih, (heatmap, title, vmm) in \
        enumerate(zip([SCA_matrix, MI, normalized_MI],
                      ['SCA matrix', 'Mutual Information',
                       'Normalized Mutual Information'],
                      [[0, 1], [0, 0.8], [0, 0.2]])):
        plt.subplot(1, 3, ih+1)
        plt.imshow(heatmap, vmin=vmm[0], vmax=vmm[1], cmap='inferno')
        plt.xlabel('residues', fontsize=10)
        plt.ylabel(None)
        plt.title('%s' % title)
        plt.colorbar(shrink=0.4)




.. image-sg:: /auto_examples/advanced/images/sphx_glr_plot_sca_vs_mi_002.png
   :alt: SCA matrix, Mutual Information, Normalized Mutual Information
   :srcset: /auto_examples/advanced/images/sphx_glr_plot_sca_vs_mi_002.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 86-88

Relationships between SCA and MI's
----------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 88-105

.. code-block:: Python

    plt.figure(figsize=(12, 3))
    plt.subplot(1, 3, 1)
    plt.plot(np.triu(SCA_matrix, 1).flatten(),
             np.triu(normalized_MI, 1).flatten(), 'o')
    plt.xlabel('SCA matrix')
    plt.ylabel('Normalized Mutual Information')

    plt.subplot(1, 3, 2)
    plt.plot(np.triu(MI, 1).flatten(), np.triu(normalized_MI, 1).flatten(), 'o')
    plt.xlabel('Mutual Information')
    plt.ylabel('Normalized Mutual Information')

    plt.subplot(1, 3, 3)
    plt.plot(np.triu(SCA_matrix, 1).flatten(), np.triu(MI, 1).flatten(), 'o')
    plt.xlabel('SCA matrix')
    plt.ylabel('Mutual Information')




.. image-sg:: /auto_examples/advanced/images/sphx_glr_plot_sca_vs_mi_003.png
   :alt: plot sca vs mi
   :srcset: /auto_examples/advanced/images/sphx_glr_plot_sca_vs_mi_003.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Text(769.1928104575164, 0.5, 'Mutual Information')



.. GENERATED FROM PYTHON SOURCE LINES 106-108

Relationship between coevolution metrics and conservation
---------------------------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 108-122

.. code-block:: Python

    plt.figure(figsize=(12, 3))

    Di = compute_conservation(seq_kept)

    for ih, (heatmap, title) in enumerate(zip([SCA_matrix, MI, normalized_MI],
                                              ['SCA', 'MI', 'normalized MI'])):
        plt.subplot(1, 3, ih + 1)
        cumul = np.sum(heatmap, axis=0) - np.diag(heatmap)
        plt.plot(Di, cumul, 'o')
        plt.xlabel('conservation')
        if ih == 0:
            plt.ylabel('cumulative score')
        plt.title(title)




.. image-sg:: /auto_examples/advanced/images/sphx_glr_plot_sca_vs_mi_004.png
   :alt: SCA, MI, normalized MI
   :srcset: /auto_examples/advanced/images/sphx_glr_plot_sca_vs_mi_004.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    computing weight of seq 1/1376  
    computing weight of seq 101/1376        
    computing weight of seq 201/1376        
    computing weight of seq 301/1376        
    computing weight of seq 401/1376        
    computing weight of seq 501/1376        
    computing weight of seq 601/1376        
    computing weight of seq 701/1376        
    computing weight of seq 801/1376        
    computing weight of seq 901/1376        
    computing weight of seq 1001/1376       
    computing weight of seq 1101/1376       
    computing weight of seq 1201/1376       
    computing weight of seq 1301/1376       




.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 6.546 seconds)


.. _sphx_glr_download_auto_examples_advanced_plot_sca_vs_mi.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/tree-timc/cocoatree/gh-pages?urlpath=lab/tree/notebooks/auto_examples/advanced/plot_sca_vs_mi.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_sca_vs_mi.ipynb <plot_sca_vs_mi.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_sca_vs_mi.py <plot_sca_vs_mi.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_sca_vs_mi.zip <plot_sca_vs_mi.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
