The Transform Module

The transform module of the repytah package holds functions used to transform matrix inputs into different forms that are of use in larger functions from other modules. The functions in the transform module focus mainly on overlapping repeated structures and annotation markers.

The transform module includes the following functions:

  • remove_overlaps: Removes any pairs of repeats with the same length and annotation marker where at least one pair of repeats overlap in time

The functions in the repytah package are meant to be used alongside other functions in the package, so many examples use functions from multiple modules. In the examples below, the following functions from the `utilities <https://github.com/smith-tinkerlab/repytah/blob/main/docs/utilities_vignette.ipynb>`__ module are called: - add_annotations - reconstruct_full_block

For more in-depth information on the function calls, an example function pipeline is shown below. Functions from the current module are shown in green.

963bd548f6604ae789892c71ba4bc8c8

Importing necessary modules

[1]:
# NumPy is used for mathematical calculations
import numpy as np

# Import transform
from repytah.transform import *

remove_overlaps

remove_overlaps removes any pairs of repeat length and specific annotation marker where there exists at least one pair of repeats that overlap in time.

The inputs for the function are: - input_mat (np.ndarray): A list of pairs of repeats with annotations marked. The first two columns refer to the first repeat, the second two columns refer to the second repeat, the fifth column denotes repeat length, and the last column contains the annotation markers. - song_length (int): The number of audio shingles in the song

The outputs for the function are: - lst_no_overlaps (np.ndarray): A list of pairs of non-overlapping repeats with annotations marked. All the repeats of a given length and with a specific annotation marker do not overlap in time. - matrix_no_overlaps (np.ndarray): A matrix representation of lst_no_overlaps where each row corresponds to a group of repeats - key_no_overlaps (np.ndarray): A vector containing the lengths of the repeats in each row of matrix_no_overlaps - annotations_no_overlaps (np.ndarray): A vector containing the annotations of the repeats in each row of matrix_no_overlaps - all_overlap_lst (np.ndarray): A list of pairs of repeats with annotations marked removed from input_mat. For each pair of repeat length and specific annotation marker, there exists at least one pair of repeats that overlap in time.

[2]:
input_mat = np.array([[1, 4, 11, 14, 4, 1],
                      [4, 7, 14, 17, 4, 1],
                      [2, 3, 12, 13, 2, 1]])
song_length = 20

print("The input array is: \n", input_mat)
print("The number of shingles is:", song_length)
The input array is:
 [[ 1  4 11 14  4  1]
 [ 4  7 14 17  4  1]
 [ 2  3 12 13  2  1]]
The number of shingles is: 20
[3]:
lst_no_overlaps, matrix_no_overlaps, key_no_overlaps, annotations_no_overlaps, all_overlap_lst = remove_overlaps(input_mat, song_length)


print("The array of the non-overlapping repeats is: \n", lst_no_overlaps)
print("The matrix representation of the non-overlapping repeats is: \n", matrix_no_overlaps)
print("The lengths of the repeats in matrix_no_overlaps are: \n", key_no_overlaps)
print("The annotations from matrix_no_overlaps are: \n", annotations_no_overlaps)
print("The array of overlapping repeats is: \n", all_overlap_lst)
The array of the non-overlapping repeats is:
 [[ 2  3 12 13  2  1]]
The matrix representation of the non-overlapping repeats is:
 [[0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]]
The lengths of the repeats in matrix_no_overlaps are:
 [2]
The annotations from matrix_no_overlaps are:
 [1]
The array of overlapping repeats is:
 [[ 1  4 11 14  4  1]
 [ 4  7 14 17  4  2]]