
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/plot_map_alignments.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_plot_map_alignments.py>`
        to download the full example code. or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_plot_map_alignments.py:


====================================================
Mapping original MSA, filtered MSA, PDB, and sectors
====================================================

In this example, we showcase how to create a pandas.DataFrame to map the
original MSA's positions with the PDB positions, PDB named position, the MSA
filtered positions, and the sectors.

.. GENERATED FROM PYTHON SOURCE LINES 10-18

.. code-block:: Python


    import numpy as np
    from cocoatree.datasets import load_S1A_serine_proteases
    from cocoatree.msa import filter_sequences
    from cocoatree.msa import map_msa_positions
    import pandas as pd









.. GENERATED FROM PYTHON SOURCE LINES 19-21

Start by loading the dataset and the different relevant information: the
MSA, the PDB positions, and the sectors positions.

.. GENERATED FROM PYTHON SOURCE LINES 21-37

.. code-block:: Python


    serine_dataset = load_S1A_serine_proteases(paper="rivoire")
    seq_id = serine_dataset["sequence_ids"]
    sequences = serine_dataset["alignment"]
    n_pos, n_seq = len(sequences[0]), len(sequences)

    # Make the sectors the same object type as what our extract_sectors_pos
    # returns.
    sectors = [
        [str(i) for i in serine_dataset["sector_positions"][key]]
        for key in serine_dataset["sector_positions"].keys()]
    pdb_pos = serine_dataset["pdb_positions"]

    seq_kept, seq_id_kept, pos_kept = filter_sequences(sequences, seq_id)









.. GENERATED FROM PYTHON SOURCE LINES 38-43

Now, we are going to map all of these onte the same referential: the
original MSA positions.

Use the function to obtain the mapping between the original MSA and the
filtered MSA

.. GENERATED FROM PYTHON SOURCE LINES 43-46

.. code-block:: Python

    pos_mapping, _ = map_msa_positions(n_pos, pos_kept)









.. GENERATED FROM PYTHON SOURCE LINES 47-49

Sectors are in the PDB referential. The sequence corresponding to the PDB is
the first of the MSA.

.. GENERATED FROM PYTHON SOURCE LINES 49-67

.. code-block:: Python

    is_mapped = np.array([s != "-" for s in sequences[0]])
    pdb_mapping = [int(val) if f else None
                   for f, val in zip(
                   is_mapped, (is_mapped.cumsum()-1))]
    pdb_pos_mapping = [
        pdb_pos[j]
        if i else None
        for i, j in zip(is_mapped, is_mapped.cumsum()-1)]
    mapping = pd.DataFrame(
        {"original_msa_pos": np.arange(n_pos, dtype=int),
         "pdb_pos": pdb_mapping,
         "pdb_named_pos": pdb_pos_mapping,
         "filtered_msa_pos": pos_mapping.values()})

    mapping["sector_1"] = np.isin(mapping["pdb_named_pos"], sectors[0])
    mapping["sector_2"] = np.isin(mapping["pdb_named_pos"], sectors[1])
    mapping["sector_3"] = np.isin(mapping["pdb_named_pos"], sectors[2])








.. GENERATED FROM PYTHON SOURCE LINES 68-69

Now print the indices of sectors 1 in the different referentials

.. GENERATED FROM PYTHON SOURCE LINES 69-71

.. code-block:: Python


    print(mapping.loc[mapping["sector_1"]].head())




.. rst-class:: sphx-glr-script-out

 .. code-block:: none

         original_msa_pos  pdb_pos pdb_named_pos  filtered_msa_pos  sector_1  sector_2  sector_3
    130               130      0.0            16              23.0      True     False     False
    133               133      3.0            19              26.0      True     False     False
    214               214     12.0            28              35.0      True     False     False
    246               246     24.0            42              48.0      True     False     False
    247               247     25.0            43              49.0      True     False     False





.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.280 seconds)


.. _sphx_glr_download_auto_examples_plot_map_alignments.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/tree-timc/cocoatree/gh-pages?urlpath=lab/tree/notebooks/auto_examples/plot_map_alignments.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_map_alignments.ipynb <plot_map_alignments.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_map_alignments.py <plot_map_alignments.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_map_alignments.zip <plot_map_alignments.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
