Bug: extraction fails on empty HTML #42

octocat Mar 10, 2026

When the input HTML is completely empty, the extraction pipeline raises an AttributeError instead of returning an empty result.

Steps to reproduce

  1. Call extract_web_content("", url)
  2. Observe the traceback

Expected behavior

Should return an ExtractedWebContent with empty markdown.

defunkt Mar 11, 2026

I can reproduce this. The issue is in pipeline.py line 42 where soup.body is accessed without a None check.

octocat Mar 12, 2026

Thanks for the confirmation! PR incoming.