pytransit package


Submodules

pytransit.norm_tools module

class pytransit.norm_tools.AdaptiveBGCNorm[source]

Bases: pytransit.norm_tools.NormMethod

cleaninfgeom(rho)[source]

Returns a ‘clean’ output from the geometric distribution.

ecdf(x)[source]

Calculates an empirical CDF of the given data.

name = 'aBGC'
static normalize(data, wigList=[], annotationPath='', doTotReads=True, bgsamples=200000)[source]

Returns the normalized data using the aBGC method.

Parameters
  • data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.

  • doTotReads (bool) – Boolean specifying whether to do TTR normalization as well.

  • bgsamples (int) – Integeer specifying how many samples to take.

Returns

Array with the normalized data.

Return type

numpy array

Example
>>> import pytransit.norm_tools as norm_tools
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print(data)
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> normdata = norm_tools.aBGC_norm(data)
>>> print(normdata)
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])

See also

normalize_data

class pytransit.norm_tools.BetaGeomNorm[source]

Bases: pytransit.norm_tools.NormMethod

cleaninfgeom(rho)[source]

Returns a ‘clean’ output from the geometric distribution.

ecdf(x)[source]

Calculates an empirical CDF of the given data.

name = 'betageom'
static normalize(data, wigList=[], annotationPath='', doTTR=True, bgsamples=200000)[source]

Returns normalized data according to the BGC method.

Parameters
  • data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.

  • doTTR (bool) – Boolean specifying whether to do TTR norm as well.

  • bgsamples (int) – Integer specifying how many samples to take.

Returns

Array with the data normalized using the betageom method.

Return type

numpy array

Example
>>> import pytransit.norm_tools as norm_tools
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print(data)
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> normdata = norm_tools.betageom_norm(data)
>>> print(normdata)
[[ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]]

See also

normalize_data

class pytransit.norm_tools.EmpHistNorm[source]

Bases: pytransit.norm_tools.NormMethod

static Fzinfnb(params, args)[source]

Objective function for the zero-inflated NB method.

name = 'emphist'
static normalize(data, wigList=[], annotationPath='')[source]

Returns the normalized data, using the empirical hist method.

Parameters
  • wigList (list) – List of paths to wig formatted datasets.

  • annotationPath (str) – Path to annotation in .prot_table or GFF3 format.

Returns

Array with the normalization factors for the emphist method.

Return type

numpy array

Example
>>> import pytransit.norm_tools as norm_tools
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print(data)
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> factors = norm_tools.emphist_factors(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"], "transit/genomes/H37Rv.prot_table")
>>> print(factors)
array([[ 1.        ],
       [ 0.63464722]])

See also

normalize_data

pytransit.norm_tools.Fzinfnb(params, args)[source]

Objective function for the zero-inflated NB method.

class pytransit.norm_tools.NZMeanNorm[source]

Bases: pytransit.norm_tools.NormMethod

name = 'nzmean'
static normalize(data, wigList=[], annotationPath='')[source]

Returns the normalization factors for the data, using the NZMean method.

Parameters

data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.

Returns

Array with the normalization factors for the nzmean method.

Return type

numpy array

Example
>>> import pytransit._tools.norm_tools as norm_tools
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print(data)
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> factors = norm_tools.nzmean_factors(data)
>>> print(factors)
array([[ 1.14836149],
       [ 0.88558737]])

See also

normalize_data

class pytransit.norm_tools.NoNorm[source]

Bases: pytransit.norm_tools.NormMethod

name = 'nonorm'
static normalize(data, wigList=[], annotationPath='')[source]
class pytransit.norm_tools.NormMethod[source]

Bases: object

name = 'undefined'
static normalize()[source]
class pytransit.norm_tools.QMgeomNorm[source]

Bases: pytransit.norm_tools.NormMethod

name = 'QMgeom'
static normalize(data, wigList=[], annotationPath='')[source]
quantile_matching()[source]
class pytransit.norm_tools.QuantileNorm[source]

Bases: pytransit.norm_tools.NormMethod

name = 'quantile'
static normalize(data, wigList=[], annotationPath='')[source]

Performs Quantile Normalization as described by Bolstad et al. 2003

Parameters

data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.

Returns

Array with the data normalized by the quantile normalization method.

Return type

numpy array

Example
>>> import pytransit.norm_tools as norm_tools
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print(data)
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> normdata = norm_tools.quantile_norm(data)
>>> print(normdata)

See also

normalize_data

class pytransit.norm_tools.TTRNorm[source]

Bases: pytransit.norm_tools.NormMethod

empirical_theta()[source]

Calculates the observed density of the data.

This is used as an estimate insertion density by some normalization methods. May be improved by more sophisticated ways later on.

Parameters

data (numpy array) –

  1. numpy array defining read-counts at N sites.

Returns

Density of the given dataset.

Return type

float

Example
>>> import pytransit.tnseq_tools as tnseq_tools
>>> import pytransit.norm_tools as norm_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print(data)
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> theta = norm_tools.empirical_theta(data)
>>> print(theta)
0.467133570136

See also

TTR_factors

name = 'emphist'
static normalize(data, wigList=[], annotationPath='', thetaEst=<function TTRNorm.empirical_theta>, muEst=<function TTRNorm.trimmed_empirical_mu>, target=100.0)[source]

Returns the normalization factors for the data, using the TTR method.

Parameters
  • data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.

  • thetaEst (function) – Function used to estimate density. Should take a list of counts as input.

  • muEst (function) – Function used to estimate mean count. Should take a list of counts as input.

Returns

Array with the normalization factors for the TTR method.

Return type

numpy array

Example
>>> import pytransit.norm_tools as norm_tools
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print(data)
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> factors = norm_tools.TTR_factors(data)
>>> print(factors)
array([[ 1.        ],
       [ 0.62862886]])

See also

normalize_data

trimmed_empirical_mu(t=0.05)[source]

Estimates the trimmed mean of the data.

This is used as an estimate of mean count by some normalization methods. May be improved by more sophisticated ways later on.

Parameters
  • data (numpy array) –

    1. numpy array defining read-counts at N sites.

  • t (float) – Float specifying fraction of start and end to trim.

Returns

(Trimmed) Mean of the given dataset.

Return type

float

Example
>>> import pytransit.tnseq_tools as tnseq_tools
>>> import pytransit.norm_tools as norm_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print(data)
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> mu = norm_tools.trimmed_empirical_mu(data)
>>> print(mu)
120.73077107

See also

TTR_factors

class pytransit.norm_tools.TotReadsNorm[source]

Bases: pytransit.norm_tools.NormMethod

name = 'totreads'
static normalize(data, wigList=[], annotationPath='')[source]

Returns the normalization factors for the data, using the total reads method.

Parameters

data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.

Returns

Array with the normalization factors for the totreads method.

Return type

numpy array

Example
>>> import pytransit.norm_tools as norm_tools
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print(data)
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> factors = norm_tools.totreads_factors(data)
>>> print(factors)
array([[ 1.2988762],
       [ 0.8129396]])

See also

normalize_data

class pytransit.norm_tools.ZeroInflatedNBNorm[source]

Bases: pytransit.norm_tools.NormMethod

name = 'zinfb'
static normalize(data, wigList=[], annotationPath='')[source]

Returns the normalization factors for the data using the zero-inflated negative binomial method.

Parameters

data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.

Returns

Array with the normalization factors for the zinfnb method.

Return type

numpy array

Example
>>> import pytransit.norm_tools as norm_tools
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print(data)
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> factors = norm_tools.zinfnb_factors(data)
>>> print(factors)
[[ 0.0121883 ]
 [ 0.00747111]]

See also

normalize_data

pytransit.norm_tools.cleaninfgeom(x, rho)[source]

Returns a ‘clean’ output from the geometric distribution.

pytransit.norm_tools.ecdf(S, x)[source]

Calculates an empirical CDF of the given data.

pytransit.norm_tools.empirical_theta(X)[source]

Calculates the observed density of the data.

This is used as an estimate insertion density by some normalization methods. May be improved by more sophisticated ways later on.

Parameters

data (numpy array) –

  1. numpy array defining read-counts at N sites.

Returns

Density of the given dataset.

Return type

float

Example
>>> import pytransit.tnseq_tools as tnseq_tools
>>> import pytransit.norm_tools as norm_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print(data)
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> theta = norm_tools.empirical_theta(data)
>>> print(theta)
0.467133570136

See also

TTR_factors

pytransit.norm_tools.norm_to_target(data, target)[source]

Returns factors to normalize the data to the given target value.

Parameters
  • data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.

  • target (float) – Floating point specifying the target for the mean of the data/

Returns

Array with the factors necessary to normalize mean to target.

Return type

numpy array

Example
>>> import pytransit.norm_tools as norm_tools
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print(data)
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> factors = norm_tools.norm_to_target(data, 100)
>>> print(factors)
[[ 1.8548104 ]
 [ 1.16088726]]

See also

normalize_data

pytransit.norm_tools.normalize_data(data, method='nonorm', wigList=[], annotationPath='')[source]

Normalizes the numpy array by the given normalization method.

Parameters
  • data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.

  • method (str) – Name of the desired normalization method.

  • wigList (list) – List of paths for the desired wig-formatted datasets.

  • annotationPath (str) – Path to the prot_table annotation file.

Returns

Array with the normalized data. list: List containing the normalization factors. Empty if not used.

Return type

numpy array

Example
>>> import pytransit.norm_tools as norm_tools
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print(data)
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
(normdata, normfactors) = norm_tools.normalize_data(data, "TTR")   # Some methods require annotation and path to wig files.
>>> print(normfactors)
array([[ 1.        ],
       [ 0.62862886]])
>> print(normdata)
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])

Note

Some normalization methods require the wigList and annotationPath arguments.

pytransit.norm_tools.trimmed_empirical_mu(X, t=0.05)[source]

Estimates the trimmed mean of the data.

This is used as an estimate of mean count by some normalization methods. May be improved by more sophisticated ways later on.

Parameters
  • data (numpy array) –

    1. numpy array defining read-counts at N sites.

  • t (float) – Float specifying fraction of start and end to trim.

Returns

(Trimmed) Mean of the given dataset.

Return type

float

Example
>>> import pytransit.tnseq_tools as tnseq_tools
>>> import pytransit.norm_tools as norm_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print(data)
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> mu = norm_tools.trimmed_empirical_mu(data)
>>> print(mu)
120.73077107

See also

TTR_factors

pytransit.norm_tools.zinfnb_factors(data)[source]

Returns the normalization factors for the data using the zero-inflated negative binomial method.

Parameters

data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.

Returns

Array with the normalization factors for the zinfnb method.

Return type

numpy array

Example
>>> import pytransit.norm_tools as norm_tools
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"])
>>> print(data)
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> factors = norm_tools.zinfnb_factors(data)
>>> print(factors)
[[ 0.0121883 ]
 [ 0.00747111]]

See also

normalize_data

pytransit.stat_tools module

pytransit.stat_tools.BH_fdr_correction(X)[source]

Adjusts p-values using the Benjamini Hochberg procedure

pytransit.stat_tools.FWER_Bayes(X)[source]
pytransit.stat_tools.F_mean_diff_dict(*args, **kwargs)[source]
pytransit.stat_tools.F_mean_diff_flat(*args, **kwargs)[source]
pytransit.stat_tools.F_shuffle_dict_libraries(*args, **kwargs)[source]
pytransit.stat_tools.F_shuffle_flat(*args, **kwargs)[source]
pytransit.stat_tools.F_sum_diff_dict(*args, **kwargs)[source]
pytransit.stat_tools.F_sum_diff_flat(*args, **kwargs)[source]
pytransit.stat_tools.HDI_from_MCMC(posterior_samples, credible_mass=0.95)[source]
pytransit.stat_tools.bFDR(X)[source]
pytransit.stat_tools.bayesian_ess_thresholds(Z_raw, ALPHA=0.05)[source]

Returns Essentiality Thresholds using a BH-like procedure

pytransit.stat_tools.binom(k, n, p)[source]

Binomial distribution. Uses Normal approximation for large ‘n’

pytransit.stat_tools.binom_cdf(k, n, p)[source]

CDF of the binomial distribution

pytransit.stat_tools.binom_test(k, n, p, type='two-sided')[source]

Does a binomial test given success, trials and probability.

pytransit.stat_tools.boxcoxTable(X, minlambda, maxlambda, dellambda)[source]

Returns a table of (loglik function, lambda) pairs for the data.

pytransit.stat_tools.boxcoxtransform(x, lambdax)[source]

Performs a box-cox transformation to data vector X. WARNING: elements of X should be all positive! Fixed: ‘>’ has changed to ‘<’

pytransit.stat_tools.comb(n, k)[source]
pytransit.stat_tools.comb1(n, k)[source]
pytransit.stat_tools.combine_lib_dicts(L1, L2)[source]
pytransit.stat_tools.cumulative_average(new_x, n, prev_avg)[source]
pytransit.stat_tools.dberndiff(d, peq, p01, p10)[source]
pytransit.stat_tools.dbinomdiff(d, n, P)[source]
pytransit.stat_tools.fact(n)[source]
pytransit.stat_tools.get_lib_data_dict(data1, ctrl_lib_str, data2, exp_lib_str, nTAs)[source]
pytransit.stat_tools.isEven(x)[source]
pytransit.stat_tools.loess(X, Y, h=10000)[source]
pytransit.stat_tools.loess_correction(X, Y, h=10000, window=100)[source]
pytransit.stat_tools.log_fac(n)[source]
pytransit.stat_tools.loglik(X, lambdax)[source]

Computes the log-likelihood function for a transformed vector Xtransform.

pytransit.stat_tools.multinomial(K, P)[source]
pytransit.stat_tools.my_perm(d, n)[source]
pytransit.stat_tools.norm(x, mu, sigma)[source]

Normal distribution

pytransit.stat_tools.parse_lib_index(nData, libstr, nTAs)[source]
pytransit.stat_tools.phi_coefficient(X, Y)[source]

Calculates the phi-coefficient for two bool arrays

pytransit.stat_tools.qberndiff(d, peq, p01, p10)[source]
pytransit.stat_tools.qbinomdiff(d, n, peq, p01, p10)[source]
pytransit.stat_tools.regress(X, Y)[source]

Performs linear regression given two vectors, X, Y.

pytransit.stat_tools.resampling(data1, data2, S=10000, testFunc=<function F_mean_diff_flat>, permFunc=<function F_shuffle_flat>, adaptive=False, lib_str1='', lib_str2='', PC=1, site_restricted=False)[source]

Does a permutation test on two sets of data.

Performs the resampling / permutation test given two sets of data using a function defining the test statistic and a function defining how to permute the data.

Parameters
  • data1 – List or numpy array with the first set of observations.

  • data2 – List or numpy array with the second set of observations.

  • S – Number of permutation tests (or samples) to obtain.

  • testFunc – Function defining the desired test statistic. Should accept two lists as arguments. Default is difference in means between the observations.

  • permFunc – Function defining the way to permute the data. Should accept one argument, the combined set of data. Default is random shuffle.

  • adaptive – Cuts-off resampling early depending on significance.

Data arrays: (data1 and data2)
Regular resampling used to take 1D arrays of counts pooled (flattened) over replicates.

Now 2D arrays are passed in and flatten them. Uses F_shuffle_flat() and F_sum_diff_flat().

If using library strings, then inputs are 2D arrays of counts for each sample.

Character in lib_str indicates which lib it is in. Make a dict out of these to pass to permFunc. Uses F_shuffle_dict_libraries() and F_sum_diff_dict_libraries().

If site_restricted, keep input arrays as 2D and pass to site_restricted_permutation() and F_sum_diff_flat().

Returns

Tuple with described values
  • test_obs – Test statistic of observation.

  • mean1 – Arithmetic mean of first set of data.

  • mean2 – Arithmetic mean of second set of data.

  • log2FC – Normalized log2FC the means.

  • pval_ltail – Lower tail p-value.

  • pval_utail – Upper tail p-value.

  • pval_2tail – Two-tailed p-value.

  • test_sample – List of samples of the test statistic.

Example
>>> import pytransit.stat_tools as stat_tools
>>> import numpy
>>> X = numpy.random.random(100)
>>> Y = numpy.random.random(100)
>>> (test_obs, mean1, mean2, log2fc, pval_ltail, pval_utail, pval_2tail, test_sample) = stat_tools.resampling(X,Y)
>>> pval_2tail
0.2167
>>> test_sample[:3]
[0.076213992904990535, -0.0052513291091412784, -0.0038425140184765172]
pytransit.stat_tools.sample_trunc_norm_post(data, S, mu0, s20, k0, nu0)[source]
pytransit.stat_tools.site_restricted_permutation(data)[source]
pytransit.stat_tools.text_histogram(X, nBins=20, resolution=200, obs=None)[source]
pytransit.stat_tools.transformToRange(X, new_min, new_max, old_min=None, old_max=None)[source]
pytransit.stat_tools.tricoeff(N, S)[source]
pytransit.stat_tools.tricube(X)[source]

pytransit.tnseq_tools module

pytransit.tnseq_tools.ExpectedRuns(n, pnon)[source]

Expected value of the run of non=insertions (Schilling, 1990):

ER_n = log(1/p)(nq) + gamma/ln(1/p) -1/2 + r1(n) + E1(n)

Parameters
  • n (int) – Integer representing the number of sites.

  • pins (float) – Floating point number representing the probability of non-insertion.

Returns

Size of the expected maximum run.

Return type

float

class pytransit.tnseq_tools.Gene(orf, name, desc, reads, position, start=0, end=0, strand='')[source]

Bases: object

Class defining a gene with useful attributes for TnSeq analysis.

This class helps define a “gene” with attributes that facilitate TnSeq analysis. Here “gene” can be defined to be any genomic region. The Genes class (with an s) can be used to define list of Gene objects with more useful operations on the “genome” level.

orf

A string defining the ID of the gene.

name

A string with the human readable name of the gene.

desc

A string with the description of the gene.

reads

List of lists of read-counts in possible site replicate dataset.

position

List of coordinates of the possible sites.

start

An integer defining the start coordinate for the gene.

end

An integer defining the end coordinate for the gene.

strand

A string defining the strand of the gene.

Example
>>> import pytransit.tnseq_tools as tnseq_tools
>>> G = tnseq_tools.Gene("Rv0001", "dnaA", "DNA Replication A", [[0,0,0,0,1,3,0,1]],  [1,21,32,37,45,58,66,130], strand="+" )
>>> print(G)
Rv0001  (dnaA)  k=3 n=8 r=4 theta=0.37500
>>> print(G.phi())
0.625
>>> print(G.tosses)
array([ 0.,  0.,  0.,  0.,  1.,  1.,  0.,  1.])

See also

Genes

__eq__(other)[source]

Compares against other gene object.

Returns

True if the gene objects have same orf id.

Return type

bool

__getitem__(i)[source]

Return read-counts at position i.

Parameters

i (int) – integer of the index of the desired site.

Returns

Reads at position i.

Return type

list

__lt__(other)[source]

Compares against other gene object.

Returns

True if the gene object id is less than the other.

Return type

bool

__str__()[source]

Return a string representation of the object.

Returns

Human readable string with some of the attributes.

Return type

str

get_gap_span()[source]

Returns the span of the maxrun of the gene (i.e. number of nucleotides).

Returns

Number of nucleotides spanned by the max run.

Return type

int

get_gene_span()[source]

Returns the number of nucleotides spanned by the gene.

Returns

Number of nucleotides spanned by the gene’s sites.

Return type

int

phi()[source]

Return the non-insertion density (“phi”) for the gene.

Returns

Non-insertion density (i.e. 1 - theta)

Return type

float

theta()[source]

Return the insertion density (“theta”) for the gene.

Returns

Density of the gene (i.e. k/n )

Return type

float

total_reads()[source]

Return the total reads for the gene.

Returns

Total sum of read-counts.

Return type

float

class pytransit.tnseq_tools.Genes(wigList, annotation, norm='nonorm', reps='All', minread=1, ignoreCodon=True, nterm=0.0, cterm=0.0, include_nc=False, data=[], position=[], genome='', transposon='himar1')[source]

Bases: object

Class defining a list of Gene objects with useful attributes for TnSeq analysis.

This class helps define a list of Gene objects with attributes that facilitate TnSeq analysis. Includes methods that calculate useful statistics and even rudamentary analysis of essentiality.

wigList

List of paths to datasets in .wig format.

protTable

String with path to annotation in .prot_table format.

norm

String with the normalization used/

reps

String with information on how replicates were handled.

minread

Integer with the minimum magnitude of read-count considered.

ignoreCodon

Boolean defining whether to ignore the start/stop codon.

nterm

Float number of the fraction of the N-terminus to ignore.

cterm

Float number of the fraction of the C-terminus to ignore.

include_nc

Boolean determining whether to include non-coding areas.

orf2index

Dictionary of orf id to index in the genes list.

genes

List of the Gene objects.

Example
>>> import pytransit.tnseq_tools as tnseq_tools
>>> G = tnseq_tools.Genes(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"], "transit/genomes/H37Rv.prot_table", norm="TTR")
>>> print(G)
Genes Object (N=3990)
>>> print(G.global_theta())
0.40853707222816626
>>> print(G["Rv0001"]   # Lookup like dictionary)
Rv0001  (dnaA)  k=0 n=31    r=31    theta=0.00000
>>> print(G[2]          # Lookup like list)
Rv0003  (recF)  k=5 n=35    r=14    theta=0.14286
>>> print(G[2].reads)
[[  62.            0.            0.            0.            0.            0.
 0.            0.            0.            0.            0.            0.
 0.            0.           63.            0.            0.           13.
46.            0.            1.            0.            0.            0.
 0.            0.            0.            0.            0.            0.
 0.            0.            0.            0.            0.        ]
 [   3.14314432   67.26328843    0.            0.            0.            0.
 0.            0.            0.           35.20321637    0.            0.
 0.            0.           30.80281433    0.          101.20924707
 0.           23.25926796    0.           16.97297932    8.17217523
 0.            0.            2.51451546    3.77177318    0.62862886
 0.            0.           69.14917502    0.            0.            0.
 0.            0.        ]]

See also

Gene

__contains__(item)[source]

Defines __contains__ to check if gene exists in the list.

Parameters

item (str) – String with the id of the gene.

Returns

Boolean with True if item is in the list.

Return type

bool

__getitem__(i)[source]

Defines __getitem__ method so that it works as dictionary and list.

Parameters

i (int) – Integer or string defining index or orf ID desired.

Returns

A gene with the index or ID equal to i.

Return type

Gene

__len__()[source]

Defines __len__ returning number of genes.

Returns

Number of genes in the list.

Return type

int

__str__()[source]

Defines __str__ to print(a generic str with the size of the list.)

Returns

Human readable string with number of genes in object.

Return type

str

global_insertion()[source]

Returns total number of insertions, i.e. sum of ‘k’ over all genes.

Returns

Total sum of reads across all genes.

Return type

float

global_phi()[source]

Returns global non-insertion frequency, of the library.

Returns

Complement of global theta i.e. 1.0-theta

Return type

float

global_reads()[source]

Returns the reads among the library.

Returns

List of all the data.

Return type

list

global_run()[source]

Returns the run assuming all genes were concatenated together.

Returns

Max run across all genes.

Return type

int

global_sites()[source]

Returns total number of sites, i.e. sum of ‘n’ over all genes.

Returns

Total number of sites across all genes.

Return type

int

global_theta()[source]

Returns global insertion frequency, of the library.

Returns

Total sites with insertions divided by total sites.

Return type

float

local_gap_span()[source]

Returns numpy array with the span of nucleotides of the largest gap, ‘s’, for each gene.

Returns

Numpy array with the span of gap for all genes.

Return type

narray

local_gene_span()[source]

Returns numpy array with the span of nucleotides of the gene, ‘t’, for each gene.

Returns

Numpy array with the span of gene for all genes.

Return type

narray

local_insertions()[source]

Returns numpy array with the number of insertions, ‘k’, for each gene.

Returns

Numpy array with the number of insertions for all genes.

Return type

narray

local_phis()[source]

Returns numpy array of non-insertion frequency, ‘phi’, for each gene.

Returns

Numpy array with the complement of density for all genes.

Return type

narray

local_reads()[source]

Returns numpy array of lists containing the read counts for each gene.

Returns

Numpy array with the list of reads for all genes.

Return type

narray

local_runs()[source]

Returns numpy array with maximum run of non-insertions, ‘r’, for each gene.

Returns

Numpy array with the max run of non-insertions for all genes.

Return type

narray

local_sites()[source]

Returns numpy array with total number of TA sites, ‘n’, for each gene.

Returns

Numpy array with the number of sites for all genes.

Return type

narray

local_thetas()[source]

Returns numpy array of insertion frequencies, ‘theta’, for each gene.

Returns

Numpy array with the density for all genes.

Return type

narray

tosses()[source]

Returns list of bernoulli trials, ‘tosses’, representing insertions in the gene.

Returns

Sites represented as bernoulli trials with insertions as true.

Return type

list

total_reads()[source]

Returns total reads among the library.

Returns

Total sum of read-counts accross all genes.

Return type

float

pytransit.tnseq_tools.GumbelCDF(x, u, B)[source]

CDF of the Gumbel distribution:

e^(-e^( (u-x)/B))

Parameters
  • x (int) – Length of the max run.

  • u (float) – Location parameter of the Gumbel dist.

  • B (float) – Scale parameter of the Gumbel dist.

Returns

Cumulative probability o the Gumbel distribution.

Return type

float

pytransit.tnseq_tools.VarR(n, pnon)[source]

Variance of the expected run of non-insertons (Schilling, 1990):

\[VarR_n = (pi^2)/(6*ln(1/p)^2) + 1/12 + r2(n) + E2(n)\]
Parameters
  • n (int) – Integer representing the number of sites.

  • pnon (float) – Floating point number representing the probability of non-insertion.

Returns

Variance of the length of the maximum run.

Return type

float

pytransit.tnseq_tools.check_wig_includes_zeros(wig_list)[source]

Returns boolean list showing whether the given files include empty sites (zero) or not.

Parameters

wig_list (list) – List of paths to wig files.

Returns

List of boolean values.

Return type

list

pytransit.tnseq_tools.combine_replicates(data, method='Sum')[source]

Returns list of data merged together.

Parameters
  • data (list) – List of numeric (replicate) data to be merged.

  • method (str) – How to combine the replicate dataset.

Returns

List of numeric dataset now merged together.

Return type

list

pytransit.tnseq_tools.getE1(n)[source]

Small Correction term. Defaults to 0.01 for now

pytransit.tnseq_tools.getE2(n)[source]

Small Correction term. Defaults to 0.01 for now

pytransit.tnseq_tools.getGamma()[source]

Euler-Mascheroni constant ~ 0.577215664901

pytransit.tnseq_tools.getR1(n)[source]

Small Correction term. Defaults to 0.000016 for now

pytransit.tnseq_tools.getR2(n)[source]

Small Correction term. Defaults to 0.00006 for now

pytransit.tnseq_tools.get_coordinate_map(galign_path, reverse=False)[source]

Attempts to get mapping of coordinates from galign file.

Parameters
  • path (str) – Path to .galign file.

  • reverse (bool) – Boolean specifying whether to do A to B or B to A.

Returns

Dictionary of coordinate in one file to another file.

Return type

dict

pytransit.tnseq_tools.get_data(wig_list)[source]
Returns a tuple of (data, position) containing a matrix of raw read-counts

, and list of coordinates.

Parameters

wig_list (list) – List of paths to wig files.

Returns

Two lists containing data and positions of the wig files given.

Return type

tuple

Example
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_data(["data/glycerol_H37Rv_rep1.wig", "data/glycerol_H37Rv_rep2.wig"])
>>> print(data)
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
pytransit.tnseq_tools.get_data_stats(reads)[source]
pytransit.tnseq_tools.get_data_w_genome(wig_list, genome)[source]
pytransit.tnseq_tools.get_data_zero_fill(wig_list)[source]
Returns a tuple of (data, position) containing a matrix of raw read counts,

and list of coordinates. Positions that are missing are filled in as zero.

Parameters

wig_list (list) – List of paths to wig files.

Returns

Two lists containing data and positions of the wig files given.

Return type

tuple

pytransit.tnseq_tools.get_extended_pos_hash_gff(path, N=None)[source]
pytransit.tnseq_tools.get_extended_pos_hash_pt(path, N=None)[source]
pytransit.tnseq_tools.get_file_types(wig_list)[source]

Returns the transposon type (himar1/tn5) of the list of wig files.

Parameters

wig_list (list) – List of paths to wig files.

Returns

List of transposon type (“himar1” or “tn5”).

Return type

list

pytransit.tnseq_tools.get_gene_info(path)[source]

Returns a dictionary that maps gene id to gene information.

Parameters

path (str) – Path to annotation in .prot_table or GFF3 format.

Returns

Dictionary of gene id to tuple of information:
  • name

  • description

  • start coordinate

  • end coordinate

  • strand

Return type

dict

pytransit.tnseq_tools.get_gene_info_gff(path)[source]

Returns a dictionary that maps gene id to gene information.

Parameters

path (str) – Path to annotation in GFF3 format.

Returns

Dictionary of gene id to tuple of information:
  • name

  • description

  • start coordinate

  • end coordinate

  • strand

Return type

dict

pytransit.tnseq_tools.get_gene_info_pt(path)[source]

Returns a dictionary that maps gene id to gene information.

Parameters

path (str) – Path to annotation in .prot_table format.

Returns

Dictionary of gene id to tuple of information:
  • name

  • description

  • start coordinate

  • end coordinate

  • strand

Return type

dict

pytransit.tnseq_tools.get_genes_in_range(pos_hash, start, end)[source]

Returns list of genes that occur in a given range of coordinates.

Parameters
  • pos_hash (dict) – Dictionary of position to list of genes.

  • start (int) – Start coordinate of the desired range.

  • end (int) – End coordinate of the desired range.

Returns

List of genes that fall within range.

Return type

list

pytransit.tnseq_tools.get_pos_hash(path)[source]

Returns a dictionary that maps coordinates to a list of genes that occur at that coordinate.

Parameters

path (str) – Path to annotation in .prot_table or GFF3 format.

Returns

Dictionary of position to list of genes that share that position.

Return type

dict

pytransit.tnseq_tools.get_pos_hash_gff(path)[source]

Returns a dictionary that maps coordinates to a list of genes that occur at that coordinate.

Parameters

path (str) – Path to annotation in GFF3 format.

Returns

Dictionary of position to list of genes that share that position.

Return type

dict

pytransit.tnseq_tools.get_pos_hash_pt(path)[source]

Returns a dictionary that maps coordinates to a list of genes that occur at that coordinate.

Parameters

path (str) – Path to annotation in .prot_table format.

Returns

Dictionary of position to list of genes that share that position.

Return type

dict

pytransit.tnseq_tools.get_unknown_file_types(wig_list, transposons)[source]
pytransit.tnseq_tools.get_wig_stats(path)[source]

Returns statistics for the given wig file with read-counts.

Parameters

path (str) – String with the path to the wig file of interest.

Returns

Tuple with the following statistical measures:
  • density

  • mean read

  • non-zero mean

  • non-zero median

  • max read

  • total reads

  • skew

  • kurtosis

Return type

tuple

pytransit.tnseq_tools.griffin_analysis(genes_obj, pins)[source]

Implements the basic Gumbel analysis of runs of non-insertion, described in Griffin et al. 2011.

This analysis method calculates a p-value of observing the maximun run of TA sites without insertions in a row (i.e. a “run”, r). Unusually long runs are indicative of an essential gene or protein domain. Assumes that there is a constant, global probability of observing an insertion (tantamount to a Bernoulli probability of success).

Parameters
  • genes_obj (Genes) – An object of the Genes class defining the genes.

  • pins (float) – The probability of insertion.

Returns

List of lists with results and information for the genes. The elements of the list are as follows:
  • ORF ID.

  • Gene Name.

  • Gene Description.

  • Number of TA sites with insertions.

  • Number of TA sites.

  • Length of largest run of non-insertion.

  • Expected run for a gene this size.

  • p-value of the observed run.

Return type

list

pytransit.tnseq_tools.maxrun(lst, item=0)[source]

Returns the length of the maximum run an item in a given list.

Parameters
  • lst (list) – List of numeric items.

  • item (float) – Number to look for consecutive runs of.

Returns

Length of the maximum run of consecutive instances of item.

Return type

int

pytransit.tnseq_tools.read_combined_wig(fname)[source]

Read the combined wig-file generated by Transit :: Filename -> Tuple([Site], [WigData], [Filename]) Site :: Integer WigData :: [Number] Filename :: String

pytransit.tnseq_tools.read_genes(fname, descriptions=False)[source]

(Filename, Options) -> [Gene] Gene :: {start, end, rv, gene, strand}

pytransit.tnseq_tools.read_genome(path)[source]

Reads in FASTA formatted genome file.

Parameters

path (str) – Path to .galign file.

Returns

String with the genomic sequence.

Return type

string

pytransit.tnseq_tools.read_samples_metadata(metadata_file, covarsToRead=[], interactionsToRead=[], condition_name='Condition')[source]

Filename -> ConditionMap ConditionMap :: {Filename: Condition}, [{Filename: Covar}], [{Filename: Interaction}] Condition :: String Covar :: String Interaction :: String

pytransit.tnseq_tools.runindex(runs)[source]

Returns a list of the indexes of the start of the runs; complements runs().

Parameters

runs (list) – List of numeric data.

Returns

List of the index of the runs of non-insertions. Non-zero sites are treated as runs of zero.

Return type

list

pytransit.tnseq_tools.runs(data)[source]

Return list of all the runs of consecutive non-insertions.

Parameters

data (list) – List of numeric data.

Returns

List of the length of the runs of non-insertions. Non-zero sites are treated as runs of zero.

Return type

list

pytransit.tnseq_tools.runs_w_info(data)[source]

Return list of all the runs of consecutive non-insertions with the start and end locations.

Parameters

data (list) – List of numeric data to check for runs.

Returns

List of dictionary from run to length and position information of the tun.

Return type

list

pytransit.tnseq_tools.rv_siteindexes_map(genes, TASiteindexMap, nterm=0.0, cterm=0.0)[source]

([Gene], {TAsite: Siteindex}) -> {Rv: Siteindex}

pytransit.tnseq_tools.tossify(data)[source]

Reduces the data into Bernoulli trials (or ‘tosses’) based on whether counts were observed or not.

Parameters

data (list) – List of numeric data.

Returns

Data represented as bernoulli trials with >0 as true.

Return type

list

pytransit.transit_tools module

class pytransit.transit_tools.AssumeZerosDialog(*args, **kw)[source]

Bases: wx._core.Dialog

OnClose(event)[source]
pytransit.transit_tools.ShowAskWarning(MSG='')[source]
pytransit.transit_tools.ShowError(MSG='')[source]
pytransit.transit_tools.ShowMessage(MSG='')[source]
pytransit.transit_tools.aton(aa)[source]
pytransit.transit_tools.basename(filepath)[source]
pytransit.transit_tools.cleanargs(rawargs)[source]

Returns a list and a dictionary with positional and keyword arguments.

-This function assumes flags must start with a “-” and and cannot be a

number (but can include them).

-Flags should either be followed by the value they want to be associated

with (i.e. -p 5) or will be assigned a value of True in the dictionary.

-The dictionary will map flags to the name given minus ONE “-” sign in

front. If you use TWO minus signs in the flag name (i.e. –verbose), the dictionary key will be the name with ONE minus sign in front (i.e. {“-verbose”:True}).

Parameters

rawargs (list) – List of positional/keyword arguments. As obtained from sys.argv.

Returns

List of positional arguments (i.e. arguments without flags),

in order provided.

dict: Dictionary mapping flag (key is flag minus the first “-“) and

their values.

Return type

list

pytransit.transit_tools.convertToCombinedWig(dataset_list, annotationPath, outputPath, normchoice='nonorm')[source]

Normalizes the input datasets and outputs the result in CombinedWig format.

Parameters
  • dataset_list (list) – List of paths to datasets in .wig format

  • annotationPath (str) – Path to annotation in .prot_table or GFF3 format.

  • outputPath (str) – Desired output path.

  • normchoice (str) – Choice for normalization method.

pytransit.transit_tools.convertToGeneCountSummary(dataset_list, annotationPath, outputPath, normchoice='nonorm')[source]

Normalizes the input datasets and outputs the result in CombinedWig format.

Parameters
  • dataset_list (list) – List of paths to datasets in .wig format

  • annotationPath (str) – Path to annotation in .prot_table or GFF3 format.

  • outputPath (str) – Desired output path.

  • normchoice (str) – Choice for normalization method.

pytransit.transit_tools.convertToIGV(self, dataset_list, annotationPath, path, normchoice=None)[source]
pytransit.transit_tools.dirname(filepath)[source]
pytransit.transit_tools.fetch_name(filepath)[source]
pytransit.transit_tools.getTabTableData(path, colnames)[source]
pytransit.transit_tools.get_extended_pos_hash(path)[source]

Returns a dictionary that maps coordinates to a list of genes that occur at that coordinate.

Parameters

path (str) – Path to annotation in .prot_table or GFF3 format.

Returns

Dictionary of position to list of genes that share that position.

Return type

dict

pytransit.transit_tools.get_gene_info(path)[source]

Returns a dictionary that maps gene id to gene information.

Parameters

path (str) – Path to annotation in .prot_table or GFF3 format.

Returns

Dictionary of gene id to tuple of information:
  • name

  • description

  • start coordinate

  • end coordinate

  • strand

Return type

dict

pytransit.transit_tools.get_pos_hash(path)[source]

Returns a dictionary that maps coordinates to a list of genes that occur at that coordinate.

Parameters

path (str) – Path to annotation in .prot_table or GFF3 format.

Returns

Dictionary of position to list of genes that share that position.

Return type

dict

pytransit.transit_tools.get_validated_data(wig_list, wxobj=None)[source]
Returns a tuple of (data, position) containing a matrix of raw read-counts

, and list of coordinates.

Parameters
  • wig_list (list) – List of paths to wig files.

  • wxobj (object) – wxPython GUI object for warnings

Returns

Two lists containing data and positions of the wig files given.

Return type

tuple

Example
>>> import pytransit.tnseq_tools as tnseq_tools
>>> (data, position) = tnseq_tools.get_validated_data(["data/glycerol_H37Rv_rep1.wig", "data/glycerol_H37Rv_rep2.wig"])
>>> print(data)
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])

See also

get_file_types combine_replicates get_data_zero_fill pytransit.norm_tools.normalize_data

pytransit.transit_tools.parseCoords(strand, aa_start, aa_end, start, end)[source]
pytransit.transit_tools.transit_error(text)[source]
pytransit.transit_tools.transit_message(msg='', prefix='')[source]
pytransit.transit_tools.validate_annotation(annotation)[source]
pytransit.transit_tools.validate_both_datasets(ctrldata, expdata)[source]
pytransit.transit_tools.validate_control_datasets(ctrldata)[source]
pytransit.transit_tools.validate_filetypes(datasets, transposons, justWarn=True)[source]
pytransit.transit_tools.validate_transposons_used(datasets, transposons, justWarn=True)[source]
pytransit.transit_tools.validate_wig_format(wig_list, wxobj=None)[source]

Module contents