{% extends "wide-layout.html" %} {% block title %}Scan statistics{% endblock %} {% block body %} {% if num_sites %}

Sites | Scores | Positions | Sequences | Interactions


These scan statistics are for a scan of {{ num_seqs }} sequences from {{ dataset_name }} averaging {{ '%.1f' % (num_bases / num_seqs) }} base pairs in length. There are {{ num_sites }} predicted sites for {{ num_motifs }} motifs averaging 1 binding site every {{ '%d' % (num_bases / num_sites) }} base pairs.

Each binding site has a Z-score associated with it. The Z-score is an estimate of how confidently STEME predicts this site is a binding site. A Z-score of 1 means STEME is sure that the site is a binding site. A Z-score of 0.5 means STEME thinks the site is equally likely to be a binding site or background genomic sequence. The Z-score is calculated from how well the site matches the motif and how well it fits STEME's background model.


Number of sites

Scan number of sites by motif image is missing!
The number of sites for each motif. The number of sites above the Z-score threshold is shown. Also shown is the fraction of the number of sites that is the sum of the Z-scores for all the sites. This is the expectation of the number of sites under STEME's model.


Z-scores

Scan scores image is missing! Legend scores image is missing!
The Z-scores for the predicted sites segregated by motif. The sites are sorted by Z-score. This plot allows us to compare how strong each motif's predicted binding sites are. Vaguer motifs with lower information contents tend to have difficulty achieving high Z-scores. In these cases a perfect match may not produce a high Z-score.


Positions

Scan positions image is missing! Legend scores image is missing!
The positions of the predicted sites in the sequences. Each marker represents a site. The y-axis represents how close the site is to the start or end of the sequence it is in. The sites are sorted in the x-axis according to their y-value. This plot allows us to see if particular motifs have sites that cluster in the centre or beginning or end of the sequences. For example, suppose a motif's scatter plot has a flat region in the centre. This would allow us to see that this motif's sites have a bias towards the centre of the sequences. A scatter plot for uniformly distributed sites would have a near constant gradient.


Sequence coverage

Scan sequence coverage image is missing! Legend scores image is missing!
The sequence coverage by motif: how many sequences have at least one site at any given Z-score.


Interactions

Scan collinearity image is missing!
The second plot shows a statistic measuring the collinearity of a pair of motifs across the sequences. If \(Z_{m,s}\) is the best Z-score for motif \(m\) in sequence \(s\), then the statistic for the pair \(m_1, m_2\) is \[ \frac{\sqrt{\langle Z_{m_1}, Z_{m_2} \rangle} }{|| Z_{m_2} ||} \] The motifs are ordered according to a hierarchical clustering (not shown).


Scan best Z image is missing!
This plot shows the strength of the best hit for each motif (y-axis) in each sequence (x-axis). The motifs are hierarchically clustered on the basis of which sequences they have strong hits in. The sequences have been K-means clustered into {{ num_seq_clusters }} clusters. {% else %}


{{ num_seqs }} sequences from {{ dataset_name }} averaging {{ '%.1f' % (num_bases / num_seqs) }} base pairs in length were scanned for sites. No sites were predicted above the threshold for any of the motifs. {% endif %}


{% endblock %}