attc

integron_finder.attc.find_attc_max(integrons, replicon, distance_threshold, model_attc_path, max_attc_size, min_attc_size, evalue_attc=1.0, circular=True, out_dir='.', cmsearch_bin='cmsearch', cpu=1)[source]

Look for attC site with cmsearch –max option which remove all heuristic filters. As this option make the algorithm way slower, we only run it in the region around a hit. We call it local_max or eagle_eyes.

Default hit

                 attC
__________________-->____-->_________-->_____________
______<--------______________________________________
         intI
              ^-------------------------------------^
             Search-space with --local_max

Updated hit

                 attC          ***         ***
__________________-->____-->___-->___-->___-->_______
______<--------______________________________________
         intI
Parameters
  • integrons (list of Integron objects.) – the integrons may contain or not attC or intI.

  • replicon (Bio.Seq.SeqRecord object.) – replicon where the integrons were found (genomic fasta file).

  • distance_threshold (int) – the maximal distance between 2 elements to aggregate them.

  • evalue_attc (float) – evalue threshold to filter out hits above it.

  • model_attc_path (str) – path to the attc model (Covariance Matrix).

  • max_attc_size (int) – maximum value for the attC size.

  • min_attc_size (int) – minimum value for the attC size.

  • circular (bool) – True if replicon is circular, False otherwise.

  • out_dir (str) – The directory where to write results used indirectly by some called functions as infernal.local_max() or infernal.expand.

  • cmsearch_bin (str) – The path to the cmsearch_bin binary to use

  • cpu (int) – call local_max with the right number of cpu

Returns

a table of attC site

Return type

pd.DataFrame object with monotonic indexes

integron_finder.attc.search_attc(attc_df, keep_palindromes, dist_threshold, replicon_size, rep_topology)[source]

Parse the attc data set (sorted along start site) for the given replicon and return list of arrays. One array is composed of attC sites on the same strand and separated by a distance less than dist_threshold.

Parameters
  • attc_df (pandas.DataFrame) –

  • keep_palindromes (bool) – True if the palindromes must be kept in attc result, False otherwise

  • dist_threshold (int) – the maximal distance between 2 elements to aggregate them

  • replicon_size (int) – the replicon number of base pair

  • rep_topology (str) – the replicon topology should be ‘lin’ or ‘circ’

Returns

a list attC sites found on replicon

Return type

list of pandas.DataFrame objects