Main Article Heading

This is the first paragraph of the article. It contains some emphasized text and a link to another page.

Section One

A paragraph with inline code and more text to ensure this scores well in the readability algorithm. The content needs to be substantial enough to be selected as the main content by the scoring heuristics.

def extract(url: str) -> ContentDocument:
    """Extract content from a URL."""
    response = fetch(url)
    return html_to_document(response.html)

Section Two

This section has a list of items that demonstrate list extraction:

And an ordered list:

  1. Step one
  2. Step two
  3. Step three

Section Three: Tables

NameValueDescription
alpha1First item
beta2Second item

Section Four: Blockquotes and Media

This is a blockquote with an important statement that should be preserved in the extracted content.

Here is an image: A test photo

Lazy image: Lazy loaded


Section Five: Definition List

Term One
The definition of term one.
Term Two
The definition of term two.

Final paragraph with a dangerous link that should be stripped, and a safe link.