jeevesagent.loader.pdf

PDF loader → markdown.

Uses pypdf (lazy import). Each page becomes a section # Page N in the markdown output. Page-level whitespace is normalized; otherwise the text comes through as the PDF’s extractable layer reports it.

PDFs vary wildly in extractability — scanned image PDFs return empty text; layout-heavy PDFs lose column structure. For production use cases needing OCR / table extraction, swap this loader for pdfplumber or unstructured (kept out of the default dependency footprint).

Functions

load_pdf(→ jeevesagent.loader.base.Document)

Load a PDF, convert to markdown.

Module Contents

jeevesagent.loader.pdf.load_pdf(path: str | pathlib.Path) jeevesagent.loader.base.Document[source]

Load a PDF, convert to markdown.

Each page becomes ## Page N followed by the extracted text. Requires pypdf: pip install 'jeevesagent[loader-pdf]'.