py_crispr_analyser package
Submodules
py_crispr_analyser.align module
- py_crispr_analyser.align.find_off_targets(guides, query_sequence, reverse_query_sequence, offset=numpy.uint64)
Find off-targets for a given query sequence using the CPU
- Parameters:
guides (numpy.ndarray) – The array of encoded gRNA sequences
query_sequence (numpy.uint64) – The query sequence
reverse_query_sequence (numpy.uint64) – The reverse complement of the query sequence
offset (numpy.uint64) – The offset of the guides default 0
- Returns:
A tuple containing a summary and a list of off-target ids
- Return type:
tuple[list[int], list[numpy.uint64]]
Deprecated since version 1.0.4:
find_off_targets_cpu()is far more performant.
- py_crispr_analyser.align.find_off_targets_cpu(guides, query_sequence, reverse_query_sequence, summary, off_target_ids_idx, off_target_ids, offset)
Find off-targets for a given query sequence using parallel CPU
- Parameters:
guides (numpy.ndarray) – The array of encoded gRNA sequences
query_sequence (numpy.uint64) – The query sequence
reverse_query_sequence (numpy.uint64) – The reverse complement of the query sequence
summary (numpy.ndarray) – The array to store the mismatch count summary
off_target_ids_idx (numpy.ndarray) – Single-element array holding the next write index
off_target_ids (numpy.ndarray) – The array to store the off-target ids
offset (numpy.uint64) – The offset of the guides, default 0
- Returns:
None
- Return type:
None
- py_crispr_analyser.align.find_off_targets_kernel(guides, query_sequence, reverse_query_sequence, summary, off_target_ids_idx, off_target_ids, offset)
Find off-targets for a given query sequence using CUDA
- Parameters:
guides (numpy.ndarray) – The array of encoded gRNA sequences
query_sequence (numpy.uint64) – The query sequence
reverse_query_sequence (numpy.uint64) – The reverse complement of the query sequence
summary (numpy.ndarray) – The array to store the results
off_target_ids_idx (numpy.ndarray) – The index for the off-target_ids array
off_target_ids (numpy.ndarray) – The array to store the off-target ids
offset (numpy.uint64) – The offset of the guides default 0
- Returns:
None
- Return type:
None
- py_crispr_analyser.align.print_off_targets(crispr_id, summary, off_target_ids, species_id)[source]
Print the off targets to the console
- Parameters:
crispr_id (numpy.uint64) – The id of the CRISPR
summary (numpy.ndarray) – The summary of the off targets
off_target_ids (numpy.ndarray) – The off target CRISPR ids
species_id (numpy.uint8) – The species id
- Returns:
None
- Return type:
None
- py_crispr_analyser.align.reverse_complement_binary(sequence, size)[source]
Reverse complement a binary sequence
- Parameters:
sequence (numpy.uint64) – The binary sequence to reverse complement
size (int) – The size of the sequence
- Returns:
The reverse complemented binary sequence
- Return type:
numpy.uint64
py_crispr_analyser.gather module
- py_crispr_analyser.gather.gather(inputfile, outputfile, pam, verbose=False, legacy_mode=False)[source]
Run the CRISPR gatherer.
- Parameters:
inputfile (str) – The input FASTA file containing DNA sequences.
outputfile (str) – The output CSV file to write results to.
pam (str) – The string PAM sequence to search for e.g. “NGG”.
verbose (bool) – A boolean indicating if verbose output is enabled. Default is False.
legacy_mode (bool) – A boolean indicating that species ID column is added to CSV file (always equalling 1). Default is False.
- Returns:
None
- Return type:
None
- py_crispr_analyser.gather.match_pam(dna_sequence, pam_sequence, pam_on_right, legacy_mode=False)[source]
Check if the DNA sequence has a PAM sequence match.
- Parameters:
dna_sequence (str) – The string DNA sequence to check.
pam_sequence (str) – The string PAM sequence to match.
pam_on_right (bool) – A boolean indicating if PAM sequence is on the right.
legacy_mode (bool) – A boolean indicating if non-ACGT chars allowed in PAM region of the DNA sequence. Default is False.
- Returns:
True if the PAM sequence matches, False otherwise.
- Return type:
bool
py_crispr_analyser.index module
- py_crispr_analyser.index.create_metadata(number_of_sequences, sequence_length, offset, species_id, species_name, assembly)[source]
Create metadata for the version 3 of the binary output file
- Parameters:
number_of_sequences (numpy.uint64) – Number of sequences in the file
sequence_length (numpy.uint64) – Length of each sequence (guide + PAM)
offset (numpy.uint64) – Offset of the first sequence
species_id (numpy.uint8) – ID of the species e.g. 1
species_name (str) – Name of the species e.g. ‘Human’
assembly (str) – Genome assembly used e.g. ‘GRCh38’
- Returns:
A bytes object containing the metadata
- Return type:
bytes
- py_crispr_analyser.index.index(inputfiles, outputfile, species, assembly, offset, species_id, guide_length=20, pam_length=3, verbose=False)[source]
Run the CRISPR indexer.
- Parameters:
inputfiles (list[str]) – The input CSV files e.g. [‘input1.csv’, ‘input2.csv’]
outputfile (str) – The name of the output binary file to be generated.
species (str) – The species name e.g. ‘Human’.
assembly (str) – The assembly name e.g. ‘GRCh38’.
offset (int) – The integer for offset after which to start numbering ID.
species_id (int) – The integer of the species ID e.g. 1.
guide_length (int) – The length of the guide sequence default is 20. (CRISPR excluding PAM)
pam_length (int) – The length of the PAM sequence default is 3. (CRISPR excluding guide)
verbose (bool) – A boolean indicating if verbose output is enabled. Default is False.
- Returns:
None
- Return type:
None
- py_crispr_analyser.index.parse_record(record, guide_length, pam_length)[source]
Parse a line from the input CSV file
- Parameters:
record (str) – A line from the input CSV file
guide_length (int) – The length of the guide sequence (CRISPR excluding PAM)
pam_length (int) – The length of the PAM sequence (CRISPR excluding guide)
- Returns:
A tuple containing the guide sequence and PAM right flag
- Return type:
tuple[str, int]
py_crispr_analyser.search module
- py_crispr_analyser.search.run(argv=['-M', 'html', 'source', 'build'])[source]
Run the search command from the command line.
- Parameters:
argv – The command line arguments
- Returns:
None
- Return type:
None
- py_crispr_analyser.search.search(guides, sequence)[source]
Search for a sequence in an indexed binary file
- Parameters:
guides (numpy.ndarray) – The numpy uint64 array of guides
sequence (str) – The query sequence to search for
verbose – A boolean to print verbose output
- Returns:
A list of indices where the sequence is found
- Return type:
list[int]
py_crispr_analyser.utils module
- class py_crispr_analyser.utils.Metadata(number_of_sequences, sequence_length, offset, species_id, species_name, assembly)[source]
Bases:
objectA dataclass to hold metadata information from the guides file.
- Parameters:
number_of_sequences (numpy.uint64)
sequence_length (numpy.uint64)
offset (numpy.uint64)
species_id (numpy.uint8)
species_name (str)
assembly (str)
- assembly: str
- number_of_sequences: numpy.uint64
- offset: numpy.uint64
- sequence_length: numpy.uint64
- species_id: numpy.uint8
- species_name: str
- py_crispr_analyser.utils.check_file_header(bytes)[source]
Check the header of the file from binary data.
- Parameters:
bytes (bytes) – The binary data to check
- Raises:
ValueError – If the header file version is not supported
ValueError – If the header length is not correct
- Returns:
None
- Return type:
None
- py_crispr_analyser.utils.get_file_metadata(bytes)[source]
Parse the metadata from binary data.
- Parameters:
bytes (bytes) – The binary data to parse
- Raises:
ValueError – If the metadata length is not correct
- Returns:
A Metadata object containing the parsed metadata
- Return type:
- py_crispr_analyser.utils.get_guides(guidesfile_handle, verbose=False)[source]
Get array of guides from the binary guides file.
- Parameters:
guidesfile_handle (BinaryIO) – The file handle of the guides file
verbose (bool) – A boolean to print verbose output
- Returns:
A numpy array of guides
- Return type:
numpy.ndarray
- py_crispr_analyser.utils.parse_c_string(data)[source]
Parse a null-terminated C string from a byte array.
- Parameters:
data (bytes) – The byte array to parse
- Returns:
The parsed string
- Return type:
str
- py_crispr_analyser.utils.print_metadata(metadata)[source]
Print the metadata information to STDERR.
- Parameters:
metadata (Metadata) – The Metadata object to print
- Returns:
None
- Return type:
None
- py_crispr_analyser.utils.reverse_complement(sequence)[source]
Return the reverse complement of a DNA sequence.
- Parameters:
sequence (str) – The string DNA sequence to reverse complement.
- Returns:
The reverse complemented string DNA sequence
- Return type:
str
- py_crispr_analyser.utils.sequence_to_binary_encoding(sequence, pam_right)[source]
Convert a string DNA sequence to bits accounting for pam right or left.
- Parameters:
sequence (str) – The string DNA sequence to convert
pam_right (int) – An integer indicating if PAM is on the right (1) or left (0)
- Returns:
A 64-bit unsigned integer representing the sequence
- Return type:
numpy.uint64