HMM¶
The HMM method can be used to determine the essentiality of the entire genome, as opposed to gene-level analysis of the other methods. It is capable of identifying regions that have unusually high or unusually low read counts (i.e. growth advantage or growth defect regions), in addition to the more common categories of essential and non-essential.
Note
Intended only for Himar1 datasets.
How does it work?¶
The HMM method automatically estimates the necessary internal statistical parameters from the datasets (adjusts for global saturation and read counts).
Usage¶
> python3 transit.py hmm <combined_wig_file> <metadata_file> <annotation_file> <condition_to_analyze> <output_file> [Optional Arguments]
Optional Arguments:
--r <string> := How to handle replicates. Sum, Mean. Default: --r Mean
--n <string> := Normalization method. Default: --n TTR
-l := Perform LOESS Correction; Helps remove possible genomic position bias. Default: Off.
--iN <float> := Ignore TAs occurring at given percentage (as integer) of the N terminus. Default: --iN 0
--iC <float> := Ignore TAs occurring at given percentage (as integer) of the C terminus. Default: --iC 0
Parameters¶
You can change how the method handles replicate datasets:
Replicates: Determines how the HMM deals with replicate datasets by either averaging the read-counts or summing read counts across datasets. For regular datasets (i.e. mean-read count > 100) the recommended setting is to average read-counts together. For sparse datasets, it summing read-counts may produce more accurate results.
GUI Mode¶
The HMM analysis method can be selected from the “Method” tab in the Menu Bar.

The parameters to input through the parameter panel for the method is equivalent to the command line usage (see parameter descriptions above for full detail):

The method is run using the combined wig, metadata, and annotation uploaded into TRANSIT.
Output and Diagnostics¶
The first file provides the most likely assignment of states for all the TA sites in the genome. Sites can belong to one of the following states: “E” (Essential), “GD” (Growth-Defect), “NE” (Non-Essential), or “GA” (Growth-Advantage). In addition, the output includes the probability of the particular site belonging to the given state. The columns of this file are defined as follows:
Sites Output File:¶
Column Header |
Column Definition |
---|---|
Location |
Coordinate of TA site |
Read Count |
Observed Read Counts |
Probability ES |
Probability for ES state |
Probability GD |
Probability for GD state |
Probability NE |
Probability for NE state |
Probability GA |
Probability for GA state |
State |
State Classification (ES = Essential, GD = Growth Defect, NE = Non-Essential, GA = Growth-Defect) |
Gene |
Gene(s) that share(s) the TA site. |
Genes Output File:¶
Column Header |
Column Definition |
---|---|
Orf |
Gene ID |
Gene Name |
Gene Name |
Description |
Gene Description |
Total Sites |
Number of TA sites |
ES Count |
Number of sites labeled ES (Essential) |
GD Count |
Number of sites labeled GD (Growth-Defect) |
NE Count |
Number of sites labeled NE (Non-Essential) |
GA Count |
Number of sites labeled GA (Growth-Advantage) |
Mean Insertions |
Mean insertion rate within the gene |
Mean Reads |
Mean read count within the gene |
State Call |
State Classification (ES = Essential, GD = Growth Defect, NE = Non-Essential, GA = Growth-Defect) |