These scan statistics are for a scan of {{ num_seqs }} sequences from {{ dataset_name }}
averaging {{ '%.1f' % (num_bases / num_seqs) }} base pairs in length. There are
{{ num_sites }} predicted sites for {{ num_motifs }} motifs averaging 1 binding site every
{{ '%d' % (num_bases / num_sites) }} base pairs.
Scores
The scores for the predicted sites segregated by motif. Each marker (point) represents one predicted site.
The sites are sorted by Z-score. The
Z-score is an estimate of how confidently STEME predicts this site is a binding site model thinks a site is a binding site.
A Z-score of 1 would mean STEME is almost sure that the site is a binding site. The Z-score is calculated from how well the
site matches the motif and how well it fits STEME's background model.
The plot allows us to compare how strong each motif's predicted binding sites are. Vaguer motifs with lower information
contents tend to have difficulty achieving high Z-scores. In these cases a perfect match may not produce a high Z-score.
Also the plot allows us to get a rough feel for how many binding sites there are for each motif from the density of each
motif's markers along the x-axis.
Positions
The positions of the predicted sites in the sequences. Each marker represents a site. The y-axis represents how
close the site is to the start or end of the sequence it is in. The sites are sorted in the x-axis according to
their y-value. This plot allows us to see if particular motifs have sites that cluster in the centre or beginning
or end of the sequences. For example, suppose a motif's scatter plot has a flat region in the centre. This would allow us
to see that this motif's sites have a bias towards the centre of the sequences. A scatter plot for uniformly
distributed sites would have a near constant gradient.
Sequences
The top plot shows the density of predicted sites for each sequence. Each sequence is represented as one x-value. The
sequences are presented in the same order that they are in the original STEME FASTA input file. Each
marker represents the density of sites for a particular motif in that sequence. The density of sites is the number
of predicted sites per base pair in the sequence. This
plot allows us to see if there are certain sequences which have a concentration of a particular motif's sites. We use
the density of sites rather than a count to give a fair comparison between sequences of different lengths. The
bottom plot shows the sequence lengths. The lengths are scatter plotted in the same order as in the top plot (and in the
original input FASTA file), they are also line plotted in green in order of size.
Number of occurrences
The number of occurrences for each motif.
{% else %}
{{ num_seqs }} sequences from {{ dataset_name }}
averaging {{ '%.1f' % (num_bases / num_seqs) }} base pairs in length were scanned for sites.
No sites were predicted above the threshold for any of the motifs.
{% endif %}