jeevesagent.loader.base¶
Core types for the loader: Document and Chunk.
Every loader normalizes its source format to a Document
whose content is markdown text and whose metadata carries
provenance (source path, MIME type, page / sheet count, etc.).
The chunkers in jeevesagent.loader.chunking consume the
content and produce Chunk objects with their own
metadata pointing back at the source.
Classes¶
Module Contents¶
- class jeevesagent.loader.base.Chunk[source]¶
One piece of a chunked document.
contentis a substring of the source document’s content (with possible cleanup — trimmed whitespace, etc.).metadatacarries:source— pass-through from the parentDocumentindex— zero-based chunk index in the sourcechunk_size— actual length ofcontent(chars)Strategy-specific keys (e.g.
headersfromMarkdownChunker,token_countfromTokenChunker).
- class jeevesagent.loader.base.Document[source]¶
A loaded document, normalized to markdown.
contentThe full markdown text. Loaders produce reasonable markdown: PDF / DOCX preserve headings + paragraphs; Excel / CSV become markdown tables; HTML preserves heading + paragraph + list structure.
metadataFree-form dict with at least:
source— the source file path (str)format— the source format ("pdf","docx","xlsx","csv","tsv","md","txt","html")
Format-specific keys may be present (
"page_count"for PDFs,"sheet_names"for Excel,"row_count"for CSV, etc.).