jeevesagent.loader.pdf
======================

.. py:module:: jeevesagent.loader.pdf

.. autoapi-nested-parse::

   PDF loader → markdown.

   Uses ``pypdf`` (lazy import). Each page becomes a section ``#
   Page N`` in the markdown output. Page-level whitespace is
   normalized; otherwise the text comes through as the PDF's
   extractable layer reports it.

   PDFs vary wildly in extractability — scanned image PDFs return
   empty text; layout-heavy PDFs lose column structure. For
   production use cases needing OCR / table extraction, swap this
   loader for ``pdfplumber`` or ``unstructured`` (kept out of the
   default dependency footprint).



Functions
---------

.. autoapisummary::

   jeevesagent.loader.pdf.load_pdf


Module Contents
---------------

.. py:function:: load_pdf(path: str | pathlib.Path) -> jeevesagent.loader.base.Document

   Load a PDF, convert to markdown.

   Each page becomes ``## Page N`` followed by the extracted text.
   Requires ``pypdf``: ``pip install 'jeevesagent[loader-pdf]'``.


