Metadata-Version: 2.4
Name: cells2table
Version: 0.4.1
Summary: Table image parsing with cell detection models
Keywords: docling,plugin
Author: jspast
Author-email: jspast <140563347+jspast@users.noreply.github.com>
License-Expression: MIT
License-File: LICENSE
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Dist: numpy>=2.2.0,<3.0.0
Requires-Dist: onnxruntime>=1.23.2,<2.0.0
Requires-Dist: opencv-python>=4.11.0.86,<5.0.0.0
Requires-Dist: docling>=2.66.0,<3.0.0 ; extra == 'docling'
Requires-Dist: torch>=2.11.0,<3.0.0 ; extra == 'docling'
Requires-Dist: torchvision>=0.26.0,<1.0.0 ; extra == 'docling'
Requires-Dist: docling-eval>=1.0.1,<2.0.0 ; extra == 'eval'
Requires-Dist: docling-metrics-table>=0.12.0 ; extra == 'eval'
Requires-Dist: ipykernel>=7.2.0,<8.0.0 ; extra == 'eval'
Requires-Dist: ipympl>=0.10.0 ; extra == 'eval'
Requires-Dist: lxml>=5.4.0 ; extra == 'eval'
Requires-Dist: matplotlib>=3.10.8,<4.0.0 ; extra == 'eval'
Requires-Dist: paddleocr[doc-parser]>=3.5.0 ; extra == 'eval'
Requires-Dist: pillow>=12.2.0,<13.0.0 ; extra == 'eval'
Requires-Dist: pyspark>=4.1.1,<5.0.0 ; extra == 'eval'
Requires-Dist: python-levenshtein>=0.27.3 ; extra == 'eval'
Requires-Dist: scipy>=1.17.1 ; extra == 'eval'
Requires-Dist: torch>=2.11.0,<3.0.0 ; extra == 'eval'
Requires-Dist: torchvision>=0.26.0,<1.0.0 ; extra == 'eval'
Requires-Dist: huggingface-hub>=0.36.0,<2.0.0 ; extra == 'huggingface'
Requires-Python: >=3.12, <4.0
Project-URL: Homepage, https://github.com/jspast/cells2table
Project-URL: Issues, https://github.com/jspast/cells2table/issues
Provides-Extra: docling
Provides-Extra: eval
Provides-Extra: huggingface
Description-Content-Type: text/markdown

# cells2table

Parsing tables in document images with cell detection models

## Implemented pipelines

### PaddlePaddle

- Classification model (wired / wireless)
- Cell detection model with different weights for each class

Uses ONNX weights downloaded automatically from [Hugging Face](https://huggingface.co/jspast/paddlepaddle-table-models-onnx) on first use.

## Instalation

With [uv](https://docs.astral.sh/uv/), add to your project with:

```sh
uv add cells2table
```

| Optional        | Description             |
| --------------- | ----------------------- |
| `docling`       | For docling usage       |
| `huggingface`   | For downloading models  |

## Usage

cells2table only extract structural information from the tables. Another library is needed to extract content from the cells.

### Docling

A [docling plugin](https://docling-project.github.io/docling/concepts/plugins/) is provided to allow integrating cells2table in a complete pipeline.

Usage example:

```python
from cells2table.docling import CustomDoclingTableStructureOptions

pipeline_options = PdfPipelineOptions(
    allow_external_plugins=True,
    table_structure_options=CustomDoclingTableStructureOptions(),
)

converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options),
        InputFormat.IMAGE: ImageFormatOption(pipeline_options=pipeline_options),
    }
)

result = converter.convert("path/to/document.pdf")
print(result.document.export_to_markdown())
```
