py_crispr_analyser package

Submodules

py_crispr_analyser.align module

py_crispr_analyser.align.find_off_targets(guides, query_sequence, reverse_query_sequence, offset=0)

Find off-targets for a given query sequence using the CPU

Parameters:
  • guides (numpy.ndarray) – The array of encoded gRNA sequences

  • query_sequence (numpy.uint64) – The query sequence

  • reverse_query_sequence (numpy.uint64) – The reverse complement of the query sequence

  • offset (numpy.uint64) – The offset of the guides default 0

Returns:

A tuple containing a summary and a list of off-target ids

Return type:

tuple[list[int], list[numpy.uint64]]

py_crispr_analyser.align.find_off_targets_kernel(guides, query_sequence, reverse_query_sequence, summary, off_target_ids_idx, off_target_ids, offset)

Find off-targets for a given query sequence using CUDA

Parameters:
  • guides (numpy.ndarray) – The array of encoded gRNA sequences

  • query_sequence (numpy.uint64) – The query sequence

  • reverse_query_sequence (numpy.uint64) – The reverse complement of the query sequence

  • summary (numpy.ndarray) – The array to store the results

  • off_target_ids_idx (numpy.ndarray) – The index for the off-target_ids array

  • off_target_ids (numpy.ndarray) – The array to store the off-target ids

  • offset (numpy.uint64) – The offset of the guides default 0

Returns:

None

Return type:

None

py_crispr_analyser.align.print_off_targets(crispr_id, summary, off_target_ids, species_id)[source]

Print the off targets to the console

Parameters:
  • crispr_id (numpy.uint64) – The id of the CRISPR

  • summary (numpy.ndarray) – The summary of the off targets

  • off_target_ids (numpy.ndarray) – The off target CRISPR ids

  • species_id (numpy.uint8) – The species id

Returns:

None

Return type:

None

py_crispr_analyser.align.reverse_complement_binary(sequence, size)[source]

Reverse complement a binary sequence

Parameters:
  • sequence (numpy.uint64) – The binary sequence to reverse complement

  • size (int) – The size of the sequence

Returns:

The reverse complemented binary sequence

Return type:

numpy.uint64

py_crispr_analyser.align.run(argv=['-M', 'html', 'source', 'build'])[source]

Run the align command to find off-targets for CRISPRs

Return type:

None

py_crispr_analyser.gather module

py_crispr_analyser.gather.gather(inputfile, outputfile, pam, verbose=False, legacy_mode=False)[source]

Run the CRISPR gatherer.

Parameters:
  • inputfile (str) – The input FASTA file containing DNA sequences.

  • outputfile (str) – The output CSV file to write results to.

  • pam (str) – The string PAM sequence to search for e.g. “NGG”.

  • verbose (bool) – A boolean indicating if verbose output is enabled. Default is False.

  • legacy_mode (bool) – A boolean indicating that species ID column is added to CSV file (always equalling 1). Default is False.

Returns:

None

Return type:

None

py_crispr_analyser.gather.match_pam(dna_sequence, pam_sequence, pam_on_right, legacy_mode=False)[source]

Check if the DNA sequence has a PAM sequence match.

Parameters:
  • dna_sequence (str) – The string DNA sequence to check.

  • pam_sequence (str) – The string PAM sequence to match.

  • pam_on_right (bool) – A boolean indicating if PAM sequence is on the right.

  • legacy_mode (bool) – A boolean indicating if non-ACGT chars allowed in PAM region of the DNA sequence. Default is False.

Returns:

True if the PAM sequence matches, False otherwise.

Return type:

bool

py_crispr_analyser.gather.run(argv=['-M', 'html', 'source', 'build'])[source]

Run the CRISPR gatherer from the command line.

py_crispr_analyser.index module

py_crispr_analyser.index.create_metadata(number_of_sequences, sequence_length, offset, species_id, species_name, assembly)[source]

Create metadata for the version 3 of the binary output file

Parameters:
  • number_of_sequences (numpy.uint64) – Number of sequences in the file

  • sequence_length (numpy.uint64) – Length of each sequence (guide + PAM)

  • offset (numpy.uint64) – Offset of the first sequence

  • species_id (numpy.uint8) – ID of the species e.g. 1

  • species_name (str) – Name of the species e.g. ‘Human’

  • assembly (str) – Genome assembly used e.g. ‘GRCh38’

Returns:

A bytes object containing the metadata

Return type:

bytes

py_crispr_analyser.index.index(inputfiles, outputfile, species, assembly, offset, species_id, guide_length=20, pam_length=3, verbose=False)[source]

Run the CRISPR indexer.

Parameters:
  • inputfiles (list[str]) – The input CSV files e.g. [‘input1.csv’, ‘input2.csv’]

  • outputfile (str) – The name of the output binary file to be generated.

  • species (str) – The species name e.g. ‘Human’.

  • assembly (str) – The assembly name e.g. ‘GRCh38’.

  • offset (int) – The integer for offset after which to start numbering ID.

  • species_id (int) – The integer of the species ID e.g. 1.

  • guide_length (int) – The length of the guide sequence default is 20. (CRISPR excluding PAM)

  • pam_length (int) – The length of the PAM sequence default is 3. (CRISPR excluding guide)

  • verbose (bool) – A boolean indicating if verbose output is enabled. Default is False.

Returns:

None

Return type:

None

py_crispr_analyser.index.parse_record(record, guide_length, pam_length)[source]

Parse a line from the input CSV file

Parameters:
  • record (str) – A line from the input CSV file

  • guide_length (int) – The length of the guide sequence (CRISPR excluding PAM)

  • pam_length (int) – The length of the PAM sequence (CRISPR excluding guide)

Returns:

A tuple containing the guide sequence and PAM right flag

Return type:

(<class ‘str’>, <class ‘int’>)

py_crispr_analyser.index.run(argv=['-M', 'html', 'source', 'build'])[source]

Run the CRISPR indexer from the command line.

Parameters:

argv – The command line arguments.

Returns:

None

Return type:

None

py_crispr_analyser.search module

py_crispr_analyser.search.run(argv=['-M', 'html', 'source', 'build'])[source]

Run the search command from the command line.

Parameters:

argv – The command line arguments

Returns:

None

Return type:

None

py_crispr_analyser.search.search(guides, sequence, verbose=False)[source]

Search for a sequence in an indexed binary file

Parameters:
  • guides (numpy.ndarray) – The numpy uint64 array of guides

  • sequence (str) – The query sequence to search for

  • verbose (bool) – A boolean to print verbose output

Returns:

A list of indices where the sequence is found

Return type:

list[int]

py_crispr_analyser.utils module

class py_crispr_analyser.utils.Metadata(number_of_sequences, sequence_length, offset, species_id, species_name, assembly)[source]

Bases: object

A dataclass to hold metadata information from the guides file.

Parameters:
  • number_of_sequences (numpy.uint64)

  • sequence_length (numpy.uint64)

  • offset (numpy.uint64)

  • species_id (numpy.uint8)

  • species_name (str)

  • assembly (str)

assembly: str
number_of_sequences: numpy.uint64
offset: numpy.uint64
sequence_length: numpy.uint64
species_id: numpy.uint8
species_name: str
py_crispr_analyser.utils.check_file_header(bytes)[source]

Check the header of the file from binary data.

Parameters:

bytes (bytes) – The binary data to check

Raises:
  • ValueError – If the header file version is not supported

  • ValueError – If the header length is not correct

Returns:

None

Return type:

None

py_crispr_analyser.utils.get_file_metadata(bytes)[source]

Parse the metadata from binary data.

Parameters:

bytes (bytes) – The binary data to parse

Raises:

ValueError – If the metadata length is not correct

Returns:

A Metadata object containing the parsed metadata

Return type:

Metadata

py_crispr_analyser.utils.get_guides(guidesfile_handle, verbose=False)[source]

Get array of guides from the binary guides file.

Parameters:
  • guidesfile_handle (BinaryIO) – The file handle of the guides file

  • verbose (bool) – A boolean to print verbose output

Returns:

A numpy array of guides

Return type:

numpy.ndarray

py_crispr_analyser.utils.parse_c_string(data)[source]

Parse a null-terminated C string from a byte array.

Parameters:

data (bytes) – The byte array to parse

Returns:

The parsed string

Return type:

str

py_crispr_analyser.utils.print_metadata(metadata)[source]

Print the metadata information to STDERR.

Parameters:

metadata (Metadata) – The Metadata object to print

Returns:

None

Return type:

None

py_crispr_analyser.utils.reverse_complement(sequence)[source]

Return the reverse complement of a DNA sequence.

Parameters:

sequence (str) – The string DNA sequence to reverse complement.

Returns:

The reverse complemented string DNA sequence

Return type:

str

py_crispr_analyser.utils.sequence_to_binary_encoding(sequence, pam_right)[source]

Convert a string DNA sequence to bits accounting for pam right or left.

Parameters:
  • sequence (str) – The string DNA sequence to convert

  • pam_right (int) – An integer indicating if PAM is on the right (1) or left (0)

Returns:

A 64-bit unsigned integer representing the sequence

Return type:

numpy.uint64

Module contents