jeevesagent.loader.base
=======================

.. py:module:: jeevesagent.loader.base

.. autoapi-nested-parse::

   Core types for the loader: :class:`Document` and :class:`Chunk`.

   Every loader normalizes its source format to a :class:`Document`
   whose ``content`` is markdown text and whose ``metadata`` carries
   provenance (source path, MIME type, page / sheet count, etc.).
   The chunkers in :mod:`jeevesagent.loader.chunking` consume the
   ``content`` and produce :class:`Chunk` objects with their own
   metadata pointing back at the source.



Classes
-------

.. autoapisummary::

   jeevesagent.loader.base.Chunk
   jeevesagent.loader.base.Document


Module Contents
---------------

.. py:class:: Chunk

   One piece of a chunked document.

   ``content`` is a substring of the source document's content
   (with possible cleanup — trimmed whitespace, etc.).
   ``metadata`` carries:

   * ``source`` — pass-through from the parent :class:`Document`
   * ``index`` — zero-based chunk index in the source
   * ``chunk_size`` — actual length of ``content`` (chars)
   * Strategy-specific keys (e.g. ``headers`` from
     :class:`MarkdownChunker`, ``token_count`` from
     :class:`TokenChunker`).


   .. py:attribute:: content
      :type:  str


   .. py:attribute:: metadata
      :type:  dict[str, Any]


.. py:class:: Document

   A loaded document, normalized to markdown.

   ``content``
       The full markdown text. Loaders produce reasonable
       markdown: PDF / DOCX preserve headings + paragraphs; Excel
       / CSV become markdown tables; HTML preserves heading +
       paragraph + list structure.

   ``metadata``
       Free-form dict with at least:

       * ``source`` — the source file path (str)
       * ``format`` — the source format (``"pdf"``, ``"docx"``,
           ``"xlsx"``, ``"csv"``, ``"tsv"``, ``"md"``, ``"txt"``,
           ``"html"``)

       Format-specific keys may be present (``"page_count"`` for
       PDFs, ``"sheet_names"`` for Excel, ``"row_count"`` for CSV,
       etc.).


   .. py:attribute:: content
      :type:  str


   .. py:attribute:: metadata
      :type:  dict[str, Any]


