Open PDF
pypdfium2 renders pages while pdfplumber reads text, captions, tables, and coordinates for comparison.
The tool keeps the PDF local, renders visual material as images, extracts body text into cards, and then removes text that already belongs inside a table, figure, or formula crop.
pypdfium2 renders pages while pdfplumber reads text, captions, tables, and coordinates for comparison.
Each source page becomes an embedded PNG for the source-page modal.
Captions, table boxes, embedded images, vector regions, and display equations become image crops.
Text blocks inside asset regions are removed from body cards so diagram labels do not duplicate.
Nearby body blocks are joined before splitting into digestible cards.
The reader is one standalone file with embedded images, search, navigation, and font-size controls.
PDFs do not expose clean article structure. A visual chart may be stored as vector lines plus many tiny text blocks, while a table may not expose a reliable table box.
Keep tables, figures, and formulas visible as images, but use geometry to avoid duplicating their internal labels as separate cards.
One crop with caption and source-page link.
One tight equation crop so math layout stays faithful.
Only prose outside the asset crop region.
The PDF has a text layer, good captions, and detectable table boxes. Crops are tight and cards read cleanly.
The PDF has vector diagrams, multi-column flow, or table text without borders. The converter has to infer boundaries.
The generated reader includes a font-size slider, source-page modal, and standalone embedded assets.