{% if stats.is_opml %}
Percentages describe this OPML feed-list report, not the entire Web. Metrics reflect the feeds listed in the OPML file and any HTML autodiscovery checks from outline url or htmlUrl values.
{% else %}
Percentages describe this Common Crawl result set, not the entire Web. Metrics reflect what Common Crawl fetched, what sites allowed it to fetch, and the configured Tranco list/sample limits for this run. Site counts and TOP_N scoping use the Tranco {{ stats.tranco_list_label }} list, normalized to registrable sites with the Public Suffix List, including private suffixes for hosted sub-sites.
{% endif %}
Responses Processed
{{ total_pages_f }}
{% if stats.is_opml %}
Fetched OPML feed URLs and, when available, associated HTML pages checked for autodiscovery.
{% else %}
All responses after Tranco filtering; {{ stats.pages_seen|comma }} had HTML, feed, XML, or sniffable Content-Type values and were analyzed further.
{% endif %}
HTML Pages Processed
{{ stats.html_responses|comma }}
HTML/XHTML responses; used as the denominator for autodiscovery page rates.
Unique Sites
{{ total_sites_f }}
{% if stats.is_opml %}
Distinct registrable sites among fetched OPML feed and HTML URLs.
{% else %}
Distinct registrable sites among processed responses (HyperLogLog estimate).
{% endif %}
Feed URLs Checked
{{ stats.feed_results_count|comma }}
{% if stats.is_opml %}
Feed URLs listed in OPML xmlUrl attributes and fetched for parsing.
{% else %}
{{ stats.feeds_sniffed|comma }} generic XML/text/plain/octet-stream responses sniffed as feeds; the rest had exact RSS/Atom media types.
{% endif %}
Analyzed Response Content-Type Distribution
HTTP Content-Type header values among responses that passed the analysis prefilter. “Other XML” is XML content without an exact RSS/Atom media type, such as application/xml or text/xml. “Other Non-XML” is sniffable non-XML content such as text/plain or application/octet-stream. Either is only counted as a feed URL if sniffing finds RSS/Atom. Parenthetical Content-Type percentages use all analyzed responses as the denominator; sniffed outcome percentages use sniffed feeds as the denominator.
{% set total_ct = stats.content_types_collapsed.values()|sum %}
{% set html_count = stats.content_types_collapsed.get("HTML", 0) %}
{% set html_pct = (html_count / total_ct * 100) if total_ct else 0 %}
{% set non_html_total = total_ct - html_count %}
{% set sniffed_total = stats.sniffed_format_counts.values()|sum %}
{% set ct_colors = {"HTML": "#94a3b8", "Atom": "#6366f1", "RSS": "#10b981", "Other XML": "#8b5cf6", "Other Non-XML": "#64748b", "Atom (sniffed)": "#818cf8", "RSS (sniffed)": "#34d399", "Other parsed (sniffed)": "#94a3b8"} %}
HTML
{{ html_count|comma }}
{{ '%.2f'|format(html_pct) }}% of total
Non-HTML types (bars scaled relative to each other):
{% for label, count in stats.content_types_collapsed.items() %}
{% if label != "HTML" %}
{% set pct_of_nonhtml = (count / non_html_total * 100) if non_html_total else 0 %}
{% set pct_of_total = (count / total_ct * 100) if total_ct else 0 %}
{{ label }}
{{ count|comma }} ({{ '%.4f'|format(pct_of_total) }}%)
{% endif %}
{% endfor %}
{% if sniffed_total %}
Sniffed feed outcomes (bars scaled relative to sniffed feeds):
{% for label, count in [("RSS (sniffed)", stats.sniffed_format_counts.rss), ("Atom (sniffed)", stats.sniffed_format_counts.atom), ("Other parsed (sniffed)", stats.sniffed_format_counts.other)] %}
{% set pct_of_sniffed = (count / sniffed_total * 100) if sniffed_total else 0 %}
{{ label }}
{{ count|comma }} ({{ '%.1f'|format(pct_of_sniffed) }}%)
{% endfor %}
{% endif %}