jeevesagent.loader.pdf¶
PDF loader → markdown.
Uses pypdf (lazy import). Each page becomes a section #
Page N in the markdown output. Page-level whitespace is
normalized; otherwise the text comes through as the PDF’s
extractable layer reports it.
PDFs vary wildly in extractability — scanned image PDFs return
empty text; layout-heavy PDFs lose column structure. For
production use cases needing OCR / table extraction, swap this
loader for pdfplumber or unstructured (kept out of the
default dependency footprint).
Functions¶
|
Load a PDF, convert to markdown. |
Module Contents¶
- jeevesagent.loader.pdf.load_pdf(path: str | pathlib.Path) jeevesagent.loader.base.Document[source]¶
Load a PDF, convert to markdown.
Each page becomes
## Page Nfollowed by the extracted text. Requirespypdf:pip install 'jeevesagent[loader-pdf]'.