Complex Nested HTML Structure

This document contains deeply nested elements designed to test HTML parsing capabilities.

Section 1: Multi-level Nesting

First level of nesting contains emphasized text and bold text.

Second level includes inline code and links.

Third level has styled spans within paragraphs.

Fourth level demonstrates highlighted text and small text.

Fifth level shows how deep the structure can go with superscript and subscript.

Section 2: Lists and Tables

Nested Lists

  • First item
    • Nested item A
      • Deep nested item 1
      • Deep nested item 2
    • Nested item B
  • Second item with inline formatting
  • Third item with embedded link

Data Table

Transformation Input Type Output Type Use Case
html-to-dict HTML String Dictionary Parsing and analysis
dict-to-html Dictionary HTML String Reconstruction
text-nodes HTML/Dict Text Map Text extraction with hashes
html-hashes HTML + Depth HTML (hashed) Privacy/debugging

Section 3: Mixed Content

Article with Multiple Paragraphs

The HTML transformation service provides multiple endpoints for processing HTML documents. Each endpoint serves a specific purpose in the transformation pipeline.

"Parsing HTML into a structured format enables semantic analysis and programmatic manipulation of web content."

HTML Processing Best Practices

The service handles complex scenarios including:

  1. Deeply nested DOM structures (tested to 256 levels)
  2. Mixed content with inline formatting
  3. Special characters and entities like &, <, >
  4. Embedded structures like:
    • Lists within paragraphs
    • Tables with formatted cells
    • Nested code blocks and
      preformatted text

Section 4: Edge Cases

This section tests various edge cases:

  • Empty elements:
  • Whitespace handling:
  • Special characters: © ® ™ € £ ¥
  • Entities: © ® ™ € £ ¥
  • Mixed: Text with bold and italic combined