The Transform Module
The transform
module of the repytah
package holds functions used to transform matrix inputs into different forms that are of use in larger functions from other modules. The functions in the transform
module focus mainly on overlapping repeated structures and annotation markers.
The transform
module includes the following functions:
remove_overlaps
: Removes any pairs of repeats with the same length and annotation marker where at least one pair of repeats overlap in time
The functions in the repytah
package are meant to be used alongside other functions in the package, so many examples use functions from multiple modules. In the examples below, the following functions from the `utilities
<https://github.com/smith-tinkerlab/repytah/blob/main/docs/utilities_vignette.ipynb>`__ module are called: - add_annotations
- reconstruct_full_block
For more in-depth information on the function calls, an example function pipeline is shown below. Functions from the current module are shown in green.
Importing necessary modules
[1]:
# NumPy is used for mathematical calculations
import numpy as np
# Import transform
from repytah.transform import *
remove_overlaps
remove_overlaps
removes any pairs of repeat length and specific annotation marker where there exists at least one pair of repeats that overlap in time.
The inputs for the function are: - input_mat (np.ndarray): A list of pairs of repeats with annotations marked. The first two columns refer to the first repeat, the second two columns refer to the second repeat, the fifth column denotes repeat length, and the last column contains the annotation markers. - song_length (int): The number of audio shingles in the song
The outputs for the function are: - lst_no_overlaps (np.ndarray): A list of pairs of non-overlapping repeats with annotations marked. All the repeats of a given length and with a specific annotation marker do not overlap in time. - matrix_no_overlaps (np.ndarray): A matrix representation of lst_no_overlaps where each row corresponds to a group of repeats - key_no_overlaps (np.ndarray): A vector containing the lengths of the repeats in each row of matrix_no_overlaps - annotations_no_overlaps (np.ndarray): A vector containing the annotations of the repeats in each row of matrix_no_overlaps - all_overlap_lst (np.ndarray): A list of pairs of repeats with annotations marked removed from input_mat. For each pair of repeat length and specific annotation marker, there exists at least one pair of repeats that overlap in time.
[2]:
input_mat = np.array([[1, 4, 11, 14, 4, 1],
[4, 7, 14, 17, 4, 1],
[2, 3, 12, 13, 2, 1]])
song_length = 20
print("The input array is: \n", input_mat)
print("The number of shingles is:", song_length)
The input array is:
[[ 1 4 11 14 4 1]
[ 4 7 14 17 4 1]
[ 2 3 12 13 2 1]]
The number of shingles is: 20
[3]:
lst_no_overlaps, matrix_no_overlaps, key_no_overlaps, annotations_no_overlaps, all_overlap_lst = remove_overlaps(input_mat, song_length)
print("The array of the non-overlapping repeats is: \n", lst_no_overlaps)
print("The matrix representation of the non-overlapping repeats is: \n", matrix_no_overlaps)
print("The lengths of the repeats in matrix_no_overlaps are: \n", key_no_overlaps)
print("The annotations from matrix_no_overlaps are: \n", annotations_no_overlaps)
print("The array of overlapping repeats is: \n", all_overlap_lst)
The array of the non-overlapping repeats is:
[[ 2 3 12 13 2 1]]
The matrix representation of the non-overlapping repeats is:
[[0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]]
The lengths of the repeats in matrix_no_overlaps are:
[2]
The annotations from matrix_no_overlaps are:
[1]
The array of overlapping repeats is:
[[ 1 4 11 14 4 1]
[ 4 7 14 17 4 2]]