Web Feed Survey

{{ stats.source_label }}: {{ crawl_id }}

{% if stats.run_limit %}
⚠️ Limited test run LIMIT={{ stats.run_limit|comma }}
{% endif %} {% if stats.max_crawl_time %}
📅 Crawl ended {{ stats.max_crawl_time.strftime('%Y-%m-%d') }}
{% endif %} {% if stats.is_opml %}
📄 OPML report local feed list
{% elif stats.top_n %}
⚠️ Filtered to top {{ stats.top_n|comma }} entries (Tranco {{ stats.tranco_list_label }}, by registrable site)
{% else %}
🌐 Full crawl — no domain filter applied
{% endif %}

1. {% if stats.is_opml %}The Feed List{% else %}The Crawl{% endif %}

{% if stats.is_opml %} Percentages describe this OPML feed-list report, not the entire Web. Metrics reflect the feeds listed in the OPML file and any HTML autodiscovery checks from outline url or htmlUrl values. {% else %} Percentages describe this Common Crawl result set, not the entire Web. Metrics reflect what Common Crawl fetched, what sites allowed it to fetch, and the configured Tranco list/sample limits for this run. Site counts and TOP_N scoping use the Tranco {{ stats.tranco_list_label }} list, normalized to registrable sites with the Public Suffix List, including private suffixes for hosted sub-sites. {% endif %}
Responses Processed
{{ total_pages_f }}
{% if stats.is_opml %}
Fetched OPML feed URLs and, when available, associated HTML pages checked for autodiscovery.
{% else %}
All responses after Tranco filtering; {{ stats.pages_seen|comma }} had HTML, feed, XML, or sniffable Content-Type values and were analyzed further.
{% endif %}
HTML Pages Processed
{{ stats.html_responses|comma }}
HTML/XHTML responses; used as the denominator for autodiscovery page rates.
Unique Sites
{{ total_sites_f }}
{% if stats.is_opml %}
Distinct registrable sites among fetched OPML feed and HTML URLs.
{% else %}
Distinct registrable sites among processed responses (HyperLogLog estimate).
{% endif %}
Feed URLs Checked
{{ stats.feed_results_count|comma }}
{% if stats.is_opml %}
Feed URLs listed in OPML xmlUrl attributes and fetched for parsing.
{% else %}
{{ stats.feeds_sniffed|comma }} generic XML/text/plain/octet-stream responses sniffed as feeds; the rest had exact RSS/Atom media types.
{% endif %}
Analyzed Response Content-Type Distribution
HTTP Content-Type header values among responses that passed the analysis prefilter. “Other XML” is XML content without an exact RSS/Atom media type, such as application/xml or text/xml. “Other Non-XML” is sniffable non-XML content such as text/plain or application/octet-stream. Either is only counted as a feed URL if sniffing finds RSS/Atom. Parenthetical Content-Type percentages use all analyzed responses as the denominator; sniffed outcome percentages use sniffed feeds as the denominator.
{% set total_ct = stats.content_types_collapsed.values()|sum %} {% set html_count = stats.content_types_collapsed.get("HTML", 0) %} {% set html_pct = (html_count / total_ct * 100) if total_ct else 0 %} {% set non_html_total = total_ct - html_count %} {% set sniffed_total = stats.sniffed_format_counts.values()|sum %} {% set ct_colors = {"HTML": "#94a3b8", "Atom": "#6366f1", "RSS": "#10b981", "Other XML": "#8b5cf6", "Other Non-XML": "#64748b", "Atom (sniffed)": "#818cf8", "RSS (sniffed)": "#34d399", "Other parsed (sniffed)": "#94a3b8"} %}
HTML
{{ html_count|comma }}
{{ '%.2f'|format(html_pct) }}% of total
Non-HTML types (bars scaled relative to each other):
{% for label, count in stats.content_types_collapsed.items() %} {% if label != "HTML" %} {% set pct_of_nonhtml = (count / non_html_total * 100) if non_html_total else 0 %} {% set pct_of_total = (count / total_ct * 100) if total_ct else 0 %}
{{ label }}
{{ count|comma }}  ({{ '%.4f'|format(pct_of_total) }}%)
{% endif %} {% endfor %} {% if sniffed_total %}
Sniffed feed outcomes (bars scaled relative to sniffed feeds):
{% for label, count in [("RSS (sniffed)", stats.sniffed_format_counts.rss), ("Atom (sniffed)", stats.sniffed_format_counts.atom), ("Other parsed (sniffed)", stats.sniffed_format_counts.other)] %} {% set pct_of_sniffed = (count / sniffed_total * 100) if sniffed_total else 0 %}
{{ label }}
{{ count|comma }}  ({{ '%.1f'|format(pct_of_sniffed) }}%)
{% endfor %} {% endif %}

2. Feed Autodiscovery

Autodiscovery Coverage
Pages with feed links
{{ pages_with_auto_f }}
{{ "%.2f"|format(stats.pages_with_autodiscovery / stats.html_responses * 100 if stats.html_responses else 0) }}% of analyzed HTML responses
Sites with feed links
{{ sites_with_auto_f }}
{{ "%.2f"|format(stats.sites_with_autodiscovery / stats.sites_seen * 100 if stats.sites_seen else 0) }}% of analyzed registrable sites
Sites with feed links is the autodiscovery subset: unique registrable sites exposing at least one RSS/Atom link in HTML.
Discovery link relations
Pages with feed rel="alternate" {{ stats.discovery_rel_alternate|comma }}
Pages with feed rel="feed" {{ stats.discovery_rel_feed|comma }}
Pages using both relations {{ stats.discovery_rel_both_page|comma }}
Pages with a multi-rel link {% if stats.discovery_link_rel_both_page_known %}{{ stats.discovery_link_rel_both_page|comma }}{% else %}not recorded{% endif %}
Multi-rel links {{ stats.discovery_link_rel_both|comma }}
These counts include only RSS/Atom autodiscovery links: HTML <link> elements whose type is RSS, Atom, or RDF feed XML. Other uses of rel="alternate" are not counted. Pages using both relations includes pages with separate alternate and feed links, plus pages with a single link whose rel contains both. Multi-rel links counts those single link elements.
Autodiscovery Links per Page

Unique feed URLs per discovered HTML page (analyzed HTML responses with zero feed links omitted: {{ zero_pages_f }}).

Autodiscovery Links per Site

Unique feed URLs per registrable site. Sites with zero links omitted: {{ zero_sites_f }}.

Pages with Duplicate Feeds
{{ pages_with_duplicates|comma }}
{{ "%.1f"|format(duplicate_prevalence_pct) }}% of {{ multi_feed_pages_total|comma }} multi-feed pages

Pages that link to redundant format variants of the same feed (same internal title and link after light normalization).

{% if duplicate_format_pairs %} {% for pair, count in duplicate_format_pairs %} {% endfor %}
Coincident formatsPages
{{ pair }}{{ count|comma }}
{% endif %}
Quality: Autodiscovered vs. Not

Distribution of quality scores among successfully parsed feeds (% of each group per decile). With autodiscovery: {{ autodiscovery_quality.n|comma }} feeds, mean {{ "%.3f"|format(autodiscovery_quality.mean) }}. Without: {{ no_autodiscovery_quality.n|comma }} feeds, mean {{ "%.3f"|format(no_autodiscovery_quality.mean) }}.

Autodiscovery Usage by HTML Platform
Known page-side platform hints among analyzed HTML responses, plus an unknown row for pages with no recognized fingerprint. Parenthetical percentages use HTML pages in that row as the denominator and show how often those pages expose RSS/Atom links. Detection is conservative and based on generator metadata plus common asset markers.
{% for row in stats.html_fingerprints %} {% endfor %}
FingerprintHTML pagesWith autodiscovery
{{ row.fingerprint }} {{ row.html_pages|comma }} {{ row.autodiscovery_pages|comma }} ({{ "%.1f"|format(row.autodiscovery_pct) }}%)

3. Feeds

Successfully Parsed Feeds
{{ stats.parsed_feeds|comma }}
{% set total_parsed = stats.parsed_feeds %}
Percentages use feed URLs checked as the denominator for parse results, and successfully parsed feeds as the denominator for autodiscovery rows.
Parse success rate {{ "%.1f"|format(stats.parse_success_pct) }}%
Broken/unparseable {{ stats.unparsed_feeds|comma }} ({{ "%.1f"|format(stats.unparsed_pct) }}%)
With autodiscovery link {{ stats.feeds_with_autodiscovery|comma }} ({{ "%.1f"|format(stats.feeds_with_autodiscovery / total_parsed * 100 if total_parsed else 0) }}%)
Without autodiscovery link {{ stats.feeds_without_autodiscovery|comma }} ({{ "%.1f"|format(stats.feeds_without_autodiscovery / total_parsed * 100 if total_parsed else 0) }}%)
Sites with Feeds Found
{{ sites_with_feeds_f }}
Unique registrable sites that host at least one successfully parsed feed URL.
Share of analyzed sites {{ "%.2f"|format(stats.sites_with_feeds_found / stats.sites_seen * 100 if stats.sites_seen else 0) }}%
Analyzed sites {{ stats.sites_seen|comma }}
Feeds with Entry Content
{{ (stats.feeds_with_content + stats.feeds_with_summary)|comma }}
Coverage rate {{ "%.1f"|format((stats.feeds_with_content + stats.feeds_with_summary) / total_parsed * 100 if total_parsed else 0) }}%
Full content {{ stats.feeds_with_content|comma }} ({{ "%.1f"|format(stats.feeds_with_content / total_parsed * 100 if total_parsed else 0) }}%)
Summary only {{ stats.feeds_with_summary|comma }} ({{ "%.1f"|format(stats.feeds_with_summary / total_parsed * 100 if total_parsed else 0) }}%)
Neither {{ stats.feeds_with_neither|comma }} ({{ "%.1f"|format(stats.feeds_with_neither / total_parsed * 100 if total_parsed else 0) }}%)
Denominator: successfully parsed feeds.
Entries Found
{{ stats.total_entries|comma }}
Percentages use successfully parsed feeds as the denominator, except dated entries use parsed feeds with entries.
Mean entries per parsed feed {{ "%.1f"|format(stats.total_entries / total_parsed if total_parsed else 0) }}
Parsed feeds with entries {{ stats.feeds_with_entries|comma }} ({{ "%.1f"|format(stats.feeds_with_entries / total_parsed * 100 if total_parsed else 0) }}%)
Feeds with dated entries {{ stats.feeds_with_entry_dates|comma }} ({{ "%.1f"|format(stats.feeds_with_entry_dates / stats.feeds_with_entries * 100 if stats.feeds_with_entries else 0) }}%)
Feeds with feed-level dates {{ stats.feeds_with_updated_date|comma }} ({{ "%.1f"|format(stats.feeds_with_updated_date / total_parsed * 100 if total_parsed else 0) }}%)
Feeds with repeated entry titles {{ stats.feeds_with_repeated_entry_titles|comma }}
Feeds with default entry titles {{ stats.feeds_with_default_entry_titles|comma }}
Feeds with repeated entry links {{ stats.feeds_with_repeated_entry_links|comma }}
Default entry titles are generic placeholder-like values such as “Default Title”.
{% if errors %}
{{ stats.total_errors|comma }}
feed parse errors
{% if stats.feed_results_count %}
{{ "%.1f"|format(stats.total_errors / stats.feed_results_count * 100) }}% of {{ stats.feed_results_count|comma }} feed URLs checked
{% endif %}
Error-row percentages use total parse errors as the denominator.
{% for row in stats.error_column_rows %} {% endfor %}
{{ row[0] }} {{ row[1]|comma }} {{ "%.1f"|format(row[1] / stats.total_errors * 100 if stats.total_errors else 0) }}% {{ row[2] }} {{ row[3]|comma if row[3] else "" }} {% if row[3] %}{{ "%.1f"|format(row[3] / stats.total_errors * 100 if stats.total_errors else 0) }}%{% endif %}
{% endif %}
Feed Availability and Freshness
Step-down view from every feed URL checked to parsed feeds with entries that appear fresh. Fresh means the newest entry date, or feed-level updated date when no entry date exists, is within {{ stats.inactive_quality.cutoff_days|comma }} days of the crawl response time. Funnel percentages use feed URLs checked as the denominator.
Feed URLs checked
{{ stats.feed_results_count|comma }}
Parsed RSS/Atom
{{ stats.parsed_feeds|comma }}
Freshness signal within cutoff
{{ stats.active_quality.n|comma }}
Active with entries
{{ stats.active_quality.with_entries|comma }}
Operational Quality
{{ "%.3f"|format(stats.mean_quality) }}
mean score, 0-1, across successfully parsed feeds
Parenthetical percentages use successfully parsed feeds as the denominator.
Quality > {{ "%.1f"|format(stats.quality_split_threshold) }} {{ stats.quality_split_count|comma }} ({{ "%.1f"|format(stats.quality_split_count / stats.parsed_feeds * 100 if stats.parsed_feeds else 0) }}%)
Freshness signal within cutoff {{ stats.active_quality.n|comma }} ({{ "%.1f"|format(stats.active_quality.pct) }}%)
Mean among those feeds {{ "%.3f"|format(stats.active_quality.mean) }}
Active with entries {{ stats.active_quality.with_entries|comma }} ({{ "%.1f"|format(stats.active_quality.with_entries / stats.parsed_feeds * 100 if stats.parsed_feeds else 0) }}%)
Undated or stale {{ stats.inactive_quality.n|comma }}
Feed Quality Distribution
Operational, non-editorial score 0–1 among successfully parsed feeds. In split tables, “quality > {{ "%.1f"|format(stats.quality_split_threshold) }}” means a feed has a usable freshness signal and enough basic entry/feed metadata to look usable; lower-scoring feeds remain in the all-feeds columns so abandoned or sparse feeds still count. Feeds with no usable date, or no freshness signal within {{ stats.inactive_quality.cutoff_days|comma }} days, score 0 here; the card above also shows the mean after counting those feeds out. Excluded from the freshness-filtered mean: {{ stats.inactive_quality.undated|comma }} undated and {{ stats.inactive_quality.stale|comma }} stale feeds.
Quality Score Components
Mean component scores across successfully parsed feeds. Bar labels show each component's weight in the composite score. Repeated/default-looking entry titles and repeated entry links reduce the entry metadata component and can cap the final score when severe.
Quality by Format
% of feeds in each quality tier per format, ordered by mean score (↓). Dot = mean score. Normalised for population size so rare formats compare fairly against common ones.
Feed Formats
Successfully parsed feeds only. RSS-family feeds: {{ stats.rss_count|comma }}; Atom feeds: {{ stats.atom_count|comma }}. Parenthetical percentages use feeds in that format as the denominator and show the share with operational quality > {{ "%.1f"|format(stats.quality_split_threshold) }}.
{% for row in format_quality_rows %} {% endfor %}
FormatCountQuality > {{ "%.1f"|format(stats.quality_split_threshold) }}Mean quality
{{ row.fmt }} {{ row.count|comma }} {{ row.quality_count|comma }}/{{ row.quality_denominator|comma }} ({{ "%.1f"|format(row.quality_pct) }}%) {{ "%.3f"|format(row.mean) }}
Charset per Format
Source: HTTP Content-Type header only.
{% for fmt, charsets in charsets_per_format.items() %} {% for charset, count in charsets.items()|sort(attribute='1', reverse=True) %} {% if loop.first %}{% endif %} {% endfor %} {% endfor %}
FormatCharsetCount
{{ fmt }}{{ charset or "unknown" }} {{ count|comma }}
Entry Content Profile
Content type classification based on Atom type attribute and RSS element semantics. Parenthetical percentages in the first count column use all successfully parsed feeds as the denominator. High-quality feeds have operational quality > {{ "%.1f"|format(stats.quality_split_threshold) }}.
{% for row in stats.content_profile_prevalence %} {% endfor %}
ProfileAll parsed feedsAmong {{ stats.quality_split_count|comma }} high-quality feeds
{{ row.profile }} {{ row.all_count|comma }} ({{ "%.1f"|format(row.all_pct) }}%) {{ row.quality_count|comma }} ({{ "%.1f"|format(row.quality_pct) }}%)
Entry Counts per Feed
Feed Languages
Top feed language declarations across successfully parsed feeds. Parenthetical percentages in the first count column use all successfully parsed feeds as the denominator. High-quality feeds have operational quality > {{ "%.1f"|format(stats.quality_split_threshold) }}.
{% for row in stats.language_prevalence %} {% endfor %}
LanguageAll parsed feedsAmong {{ stats.quality_split_count|comma }} high-quality feeds
{{ row.language }} {{ row.all_count|comma }} ({{ "%.1f"|format(row.all_pct) }}%) {{ row.quality_count|comma }} ({{ "%.1f"|format(row.quality_pct) }}%)
Language Tagging
Feed-internal language signals are read from xml:lang attributes, RSS <language>, Dublin Core dc:language, and Atom link hreflang. Counts are successfully parsed feeds; categories can overlap except the no-language row. HTTP/feed mismatches are only counted when an HTTP language conflicts with a feed-level language.
No language information {{ stats.lang_no_info|comma }} ({{ "%.1f"|format(stats.lang_no_info / stats.parsed_feeds * 100 if stats.parsed_feeds else 0) }}%)
HTTP Content-Language {{ stats.lang_src_http|comma }} ({{ "%.1f"|format(stats.lang_src_http / stats.parsed_feeds * 100 if stats.parsed_feeds else 0) }}%)
Feed-level language {{ stats.lang_src_feed|comma }} ({{ "%.1f"|format(stats.lang_src_feed / stats.parsed_feeds * 100 if stats.parsed_feeds else 0) }}%)
Entry-level language {{ stats.lang_src_entry|comma }} ({{ "%.1f"|format(stats.lang_src_entry / stats.parsed_feeds * 100 if stats.parsed_feeds else 0) }}%)
Both HTTP and feed-level language {{ stats.lang_http_feed|comma }} ({{ "%.1f"|format(stats.lang_http_feed / stats.parsed_feeds * 100 if stats.parsed_feeds else 0) }}%)
Mismatching HTTP and feed-level language {{ stats.lang_mismatches|comma }} ({{ "%.1f"|format(stats.lang_mismatches / stats.parsed_feeds * 100 if stats.parsed_feeds else 0) }}%)
Multiple entry languages {{ stats.lang_multiple_entry_languages|comma }} ({{ "%.1f"|format(stats.lang_multiple_entry_languages / stats.parsed_feeds * 100 if stats.parsed_feeds else 0) }}%)
Uses hreflang {{ stats.lang_hreflang|comma }} ({{ "%.1f"|format(stats.lang_hreflang / stats.parsed_feeds * 100 if stats.parsed_feeds else 0) }}%)
Multiple entry languages includes direct entry tags plus feed/HTTP languages that untagged entries inherit.
Feed Recency (CDF)
% of feeds with a last-update date whose update falls within the given age.{% if feed_recency_cdf.no_date %} {{ feed_recency_cdf.no_date|comma }} feeds with no date excluded from denominator.{% endif %}
Newest Entry Recency (CDF)
% of feeds with at least one entry whose most recent entry falls within the given age. {{ n_zero_entry|comma }} zero-entry feeds excluded from denominator.{% if entry_recency_cdf.no_date %} {{ entry_recency_cdf.no_date|comma }} have no entry date.{% endif %}
Feed History Depth (CDF of Oldest Entry Age)
% of feeds with at least one entry whose oldest entry is at most the given age — i.e. how far back the feed's window reaches. A client polling every N days will miss entries in feeds whose oldest entry is < N days old when the poll fires. {{ n_zero_entry|comma }} zero-entry feeds excluded from denominator.{% if oldest_entry_cdf.no_date %} {{ oldest_entry_cdf.no_date|comma }} have no oldest-entry date.{% endif %}
Inferred Update Cadence (CDF)
% of feeds with enough dated entries whose average interval between entries is at most the given time. Cadence is inferred from the span between oldest and newest entry dates divided by entry count. {% if stats.update_cadence_cdf.no_cadence %}{{ stats.update_cadence_cdf.no_cadence|comma }} feeds lack enough dated entries to infer cadence.{% endif %}
Entry Content Length
Distribution of entry body or summary text lengths, measured after parsing successfully parsed feeds. This helps distinguish feeds that carry full content from feeds that mostly expose short summaries, links, or empty entries.
Feed Link Relations
Feed-level Atom link relations, including Atom links embedded in RSS channels. Self/canonical means rel=self; hub covers WebSub/PubSubHubbub discovery; paging and archive cover feed navigation and archived-feed links. Parenthetical percentages use all successfully parsed feeds as the denominator.
{% for row in stats.feed_link_signals %} {% endfor %}
SignalAll parsed feedsAmong {{ stats.quality_split_count|comma }} high-quality feeds
{{ row.signal }} {{ row.all_count|comma }} ({{ "%.1f"|format(row.all_pct) }}%) {{ row.quality_count|comma }} ({{ "%.1f"|format(row.quality_pct) }}%)
Top Feed Extensions
Non-core namespace elements encountered across successfully parsed feeds. Parenthetical percentages in the first count column use all successfully parsed feeds as the denominator. High-quality feeds have operational quality > {{ "%.1f"|format(stats.quality_split_threshold) }}.
{% for row in extension_prevalence %} {% endfor %}
ElementAll parsed feedsAmong {{ stats.quality_split_count|comma }} high-quality feeds
{% if row.extension_href %} {{ row.extension_prefix }}:{{ row.extension_local }} {% else %} {{ row.extension }} {% endif %} {{ row.all_count|comma }} ({{ "%.1f"|format(row.all_pct) }}%) {{ row.quality_count|comma }} ({{ "%.1f"|format(row.quality_pct) }}%)
Platform Fingerprints
Known feed generators or platform headers observed on parsed feeds. This is intentionally conservative; absent fingerprints mean “not identified,” not “custom-built.” Parenthetical percentages in the first count column use all successfully parsed feeds as the denominator. High-quality feeds have operational quality > {{ "%.1f"|format(stats.quality_split_threshold) }}; the final column shows the share of feeds in that fingerprint row that clear the threshold.
{% for row in stats.fingerprint_prevalence %} {% endfor %}
FingerprintAll parsed feedsAmong {{ stats.quality_split_count|comma }} high-quality feedsQuality within fingerprint
{{ row.fingerprint }} {{ row.all_count|comma }} ({{ "%.1f"|format(row.all_pct) }}%) {{ row.quality_count|comma }} ({{ "%.1f"|format(row.quality_pct) }}%) {{ "%.1f"|format(row.within_label_quality_pct) }}%
Autodiscovered Feed Quality by Source Platform
Quality of successfully parsed feeds found through HTML autodiscovery, grouped by recognized platform hints on the source page. The unknown row covers autodiscovered feeds whose source page had no recognized platform fingerprint. Parenthetical percentages use parsed feeds in that source-platform row as the denominator.
{% if stats.source_fingerprint_quality %}
{% for row in stats.source_fingerprint_quality %} {% endfor %}
Source platformAutodiscovered parsed feedsQuality > {{ "%.1f"|format(stats.quality_split_threshold) }}Mean quality
{{ row.fingerprint }} {{ row.parsed_feeds|comma }} {{ row.quality_count|comma }}/{{ row.quality_denominator|comma }} ({{ "%.1f"|format(row.quality_pct) }}%) {{ "%.3f"|format(row.mean_quality) }}
{% else %}

No successfully parsed autodiscovered feeds were found in this run.

{% endif %}