jeevesagent.loader.html

HTML loader → markdown.

Uses beautifulsoup4 (lazy import) to walk the DOM and emit markdown that preserves heading + paragraph + list structure. Strips <script> / <style> content. Drops most attributes; the goal is to keep the textual structure, not pixel-perfect rendering.

Functions

load_html(→ jeevesagent.loader.base.Document)

Load an HTML file → markdown.

Module Contents

jeevesagent.loader.html.load_html(path: str | pathlib.Path) jeevesagent.loader.base.Document[source]

Load an HTML file → markdown.

Requires beautifulsoup4: pip install 'jeevesagent[loader-html]'.