Metadata-Version: 2.4
Name: docflow-sager
Version: 0.1.0
Summary: Document-native DAG runner for preprocessing PDFs, Office files, and email messages into structured evidence artifacts.
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: pydantic<3,>=2.7
Requires-Dist: networkx<4,>=3.3
Requires-Dist: numpy<3,>=1.26
Requires-Dist: scikit-learn<2,>=1.5
Requires-Dist: jinja2<4,>=3.1
Requires-Dist: pypdf<6,>=5
Requires-Dist: PyYAML<7,>=6
Requires-Dist: openpyxl<4,>=3.1
Requires-Dist: xlrd<3,>=2
Requires-Dist: olefile<1,>=0.47
Requires-Dist: extract-msg<1,>=0.41
Provides-Extra: dev
Requires-Dist: pytest<9,>=8.2; extra == "dev"

# docflow

`docflow` is a document-native DAG runner for preprocessing files into structured evidence artifacts, plain text exports, and downstream indexes.

It supports these input types:

- `pdf`
- `docx`
- `doc`
- `xlsx`
- `xls`
- `msg`

Core capabilities:

- flow execution from `*.flow.dag.yaml`
- document parsing into evidence atoms
- metadata enrichment
- evidence-graph construction
- semantic, structural, and spatial indexes
- plain-text and structured output artifacts

Example install for local development:

```bash
cd packages/docflow
python3 -m pip install -e '.[dev]'
```

Example CLI usage:

```bash
docflow run ../../docflow/examples/document_preprocess.flow.dag.yaml --source-dir /path/to/documents --output-dir /tmp/docflow_run --trace
```
