sub_title[[115, 75, 614, 106]]
## A Calibration Study for Local OCR  

text[[115, 123, 373, 142]]
Ada Researcher, Boris Scientist  

sub_title[[115, 172, 196, 191]]
## Abstract  

text[[115, 201, 878, 262]]
We present a calibration study for local optical character recognition. Our method converts academic documents into Markdown using a vision model running entirely on device. We report layout grounding accuracy and discuss coordinate- frame conventions used by the decoder.  

sub_title[[117, 298, 262, 319]]
## 1. Introduction  

text[[115, 330, 863, 390]]
Document understanding has progressed rapidly. Converting PDFs to clean text is a prerequisite for downstream language- model pipelines. We focus on the offline setting, where no cloud service is involved.  

text[[115, 405, 871, 466]]
We summarize related work and motivate the need for figure grounding. Existing tools transcribe text well but rarely localize figures, which is essential when figures must be replaced by generated descriptions.  

image[[240, 500, 755, 757]]
image_caption[[240, 766, 624, 784]]
<center>Figure 1: Calibration curve for the proposed method. </center>  

sub_title[[115, 803, 220, 823]]
## 2. Method  

text[[115, 835, 854, 875]]
Let \(x\) denote the input page and \(y\) the predicted markdown. We model \(\mathsf{p}(\mathsf{y}\mid \mathsf{x})\) with a decoder conditioned on encoder features. The loss is \(\mathsf{L} = - \log \mathsf{p}(\mathsf{y}\mid \mathsf{x})\) .