
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/plot_simple_sca.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_plot_simple_sca.py>`
        to download the full example code. or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_plot_simple_sca.py:


==============================
The simplest SCA analysis ever
==============================

This example shows the full process to perform a complete SCA analysis and
detect protein sectors from data importation, MSA filtering.

.. GENERATED FROM PYTHON SOURCE LINES 9-14

.. code-block:: Python


    import cocoatree.datasets as c_data
    import cocoatree
    import matplotlib.pyplot as plt








.. GENERATED FROM PYTHON SOURCE LINES 15-21

Load the dataset
----------------

We start by importing the dataset. In this case, we can directly load the S1
serine protease dataset provided in :mod:`cocoatree`. To work on your on
dataset, you can use the `cocoatree.io.load_msa` function.

.. GENERATED FROM PYTHON SOURCE LINES 21-28

.. code-block:: Python


    serine_dataset = c_data.load_S1A_serine_proteases()
    loaded_seqs = serine_dataset["alignment"]
    loaded_seqs_id = serine_dataset["sequence_ids"]
    n_loaded_pos, n_loaded_seqs = len(loaded_seqs[0]), len(loaded_seqs)









.. GENERATED FROM PYTHON SOURCE LINES 29-32

Compute the SCA analysis
------------------------


.. GENERATED FROM PYTHON SOURCE LINES 32-37

.. code-block:: Python


    coevol_matrix, results = cocoatree.perform_sca(
        loaded_seqs_id, loaded_seqs, n_components=3)
    print(results.head())





.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    computing weight of seq 1/1376  
    computing weight of seq 101/1376        
    computing weight of seq 201/1376        
    computing weight of seq 301/1376        
    computing weight of seq 401/1376        
    computing weight of seq 501/1376        
    computing weight of seq 601/1376        
    computing weight of seq 701/1376        
    computing weight of seq 801/1376        
    computing weight of seq 901/1376        
    computing weight of seq 1001/1376       
    computing weight of seq 1101/1376       
    computing weight of seq 1201/1376       
    computing weight of seq 1301/1376       
       original_msa_pos  filtered_msa_pos  PC1  IC1  sector_1  PC2  IC2  sector_2  PC3  IC3  sector_3
    0                 0               NaN  NaN  NaN     False  NaN  NaN     False  NaN  NaN     False
    1                 1               NaN  NaN  NaN     False  NaN  NaN     False  NaN  NaN     False
    2                 2               NaN  NaN  NaN     False  NaN  NaN     False  NaN  NaN     False
    3                 3               NaN  NaN  NaN     False  NaN  NaN     False  NaN  NaN     False
    4                 4               NaN  NaN  NaN     False  NaN  NaN     False  NaN  NaN     False




.. GENERATED FROM PYTHON SOURCE LINES 38-39

Select the position of the first sector in the results dataframe.

.. GENERATED FROM PYTHON SOURCE LINES 39-42

.. code-block:: Python


    print(results.loc[results["sector_1"]].head())





.. rst-class:: sphx-glr-script-out

 .. code-block:: none

         original_msa_pos  filtered_msa_pos       PC1       IC1  sector_1       PC2       IC2  sector_2       PC3       IC3  sector_3
    130               130              23.0 -0.090217  0.095792      True -0.047229 -0.000824     False  0.029506  0.045516     False
    133               133              26.0 -0.116480  0.130897      True -0.071281 -0.011032     False  0.044673  0.059013     False
    214               214              35.0 -0.106719  0.133006      True -0.078486  0.072004     False -0.087057 -0.046816     False
    246               246              48.0 -0.130522  0.221918      True -0.181472 -0.010578     False -0.028971 -0.038197     False
    247               247              49.0 -0.110328  0.171331      True -0.133246 -0.042140     False  0.038413  0.023443     False




.. GENERATED FROM PYTHON SOURCE LINES 43-45

Visualizing the sectors on the first and second PC
--------------------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 45-64

.. code-block:: Python


    # Plotting all elements in components
    fig, ax = plt.subplots()
    ax.plot(results.loc[:, "PC1"],
            results.loc[:, "PC2"],
            ".", c="black")

    # Plotting elements in sectors
    for isec, color in zip([1, 2, 3], ['r', 'g', 'b']):
        ax.plot(results.loc[results["sector_%d" % isec], "PC1"],
                results.loc[results["sector_%d" % isec], "PC2"],
                ".", c=color, label="Sector %d" % isec)

    ax.set_xlabel("PC1")
    ax.set_ylabel("PC2")

    ax.legend()





.. image-sg:: /auto_examples/images/sphx_glr_plot_simple_sca_001.png
   :alt: plot simple sca
   :srcset: /auto_examples/images/sphx_glr_plot_simple_sca_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <matplotlib.legend.Legend object at 0x7f0759afb290>



.. GENERATED FROM PYTHON SOURCE LINES 65-67

Visualizing the sectors on the first and second IC
--------------------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 67-84

.. code-block:: Python


    # Plotting all elements in components
    fig, ax = plt.subplots()
    ax.plot(results.loc[:, "IC1"],
            results.loc[:, "IC2"],
            ".", c="black")

    # Plotting elements in sectors
    for isec, color in zip([1, 2, 3], ['r', 'g', 'b']):
        ax.plot(results.loc[results["sector_%d" % isec], "IC1"],
                results.loc[results["sector_%d" % isec], "IC2"],
                ".", c=color, label="Sector %d" % isec)

    ax.set_xlabel("IC1")
    ax.set_ylabel("IC2")

    ax.legend()



.. image-sg:: /auto_examples/images/sphx_glr_plot_simple_sca_002.png
   :alt: plot simple sca
   :srcset: /auto_examples/images/sphx_glr_plot_simple_sca_002.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <matplotlib.legend.Legend object at 0x7f075a18fd90>




.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 1.895 seconds)


.. _sphx_glr_download_auto_examples_plot_simple_sca.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/tree-timc/cocoatree/gh-pages?urlpath=lab/tree/notebooks/auto_examples/plot_simple_sca.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_simple_sca.ipynb <plot_simple_sca.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_simple_sca.py <plot_simple_sca.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_simple_sca.zip <plot_simple_sca.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
