These scan statistics are for a scan of {{ num_seqs }} sequences from {{ dataset_name }}
averaging {{ '%.1f' % (num_bases / num_seqs) }} base pairs in length. There are
{{ num_sites }} predicted sites for {{ num_motifs }} motifs averaging 1 binding site every
{{ '%d' % (num_bases / num_sites) }} base pairs.
Z-scores
The Z-scores for the predicted sites segregated by motif.
The sites are sorted by Z-score. The
Z-score is an estimate of how confidently STEME predicts this site is a binding site.
A Z-score of 1 would mean STEME is sure that the site is a binding site.
The Z-score is calculated from how well the
site matches the motif and how well it fits STEME's background model.
The plot allows us to compare how strong each motif's predicted binding sites are.
Vaguer motifs with lower information
contents tend to have difficulty achieving high Z-scores. In these cases a perfect match
may not produce a high Z-score.
Number of occurrences
The number of occurrences for each motif.
Positions
The positions of the predicted sites in the sequences. Each marker represents a site.
The y-axis represents how close the site is to the start or end of the sequence it
is in. The sites are sorted in the x-axis according to their y-value. This plot allows
us to see if particular motifs have sites that cluster in the centre or beginning
or end of the sequences. For example, suppose a motif's scatter plot has a flat region
in the centre. This would allow us to see that this motif's sites have a bias towards
the centre of the sequences. A scatter plot for uniformly distributed sites would have
a near constant gradient.
Sequences
The sequence coverage by motif: how many sequences have at least one site as a function
of Z-score threshold.
This plot shows the density of predicted sites for each sequence. Each sequence is
represented as one x-value. The sequences are presented in the same order that they
are in the original STEME FASTA input file. Each marker represents the density of
sites for a particular motif in that sequence. The density of sites is the number
of predicted sites per base pair in the sequence. This plot allows us to see if there
are certain sequences which have a concentration of a particular motif's sites. We use
the density of sites rather than a count to give a fair comparison between sequences
of different lengths.
This plot shows the sequence lengths. The lengths are scatter plotted in the
same order as in the plot above (and in the original input FASTA file), they are
also line plotted in green in order of size.
Co-occurrences
The first plot shows the strength of the best hit for each motif (y-axis) in
each sequence (x-axis). The motifs are hierarchically clustered on the basis
of which sequences they have strong hits in.
The second plot shows a statistic measuring the co-occurrence of a pair of motifs
across the sequences.
The motifs are again ordered according to a hierarchical clustering (not shown).
{% else %}
{{ num_seqs }} sequences from {{ dataset_name }}
averaging {{ '%.1f' % (num_bases / num_seqs) }} base pairs in length were scanned for sites.
No sites were predicted above the threshold for any of the motifs.
{% endif %}