Metadata-Version: 2.4
Name: marksense
Version: 0.2.0
Summary: Read marked paper forms (bubble sheets, surveys, checklists, exams) into structured data — locally, no cloud.
Project-URL: Homepage, https://github.com/RoyAbra27/marksense
Project-URL: Repository, https://github.com/RoyAbra27/marksense
Project-URL: Issues, https://github.com/RoyAbra27/marksense/issues
Author: Roy Abra
License: MIT
License-File: LICENSE
Keywords: bubble-sheet,computer-vision,forms,omr,onnx,opencv,optical-mark-recognition
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Image Recognition
Classifier: Topic :: Text Processing
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24.0
Requires-Dist: onnxruntime>=1.18.0
Requires-Dist: opencv-python-headless>=4.8.0
Requires-Dist: pymupdf>=1.24.0
Provides-Extra: dev
Requires-Dist: fastapi>=0.111; extra == 'dev'
Requires-Dist: httpx>=0.27; extra == 'dev'
Requires-Dist: onnx>=1.16; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: python-multipart>=0.0.9; extra == 'dev'
Requires-Dist: ruff>=0.6; extra == 'dev'
Requires-Dist: uvicorn>=0.30; extra == 'dev'
Provides-Extra: quantize
Requires-Dist: onnx>=1.16; extra == 'quantize'
Provides-Extra: service
Requires-Dist: fastapi>=0.111; extra == 'service'
Requires-Dist: python-multipart>=0.0.9; extra == 'service'
Requires-Dist: uvicorn>=0.30; extra == 'service'
Provides-Extra: train
Requires-Dist: augraphy>=8.2; extra == 'train'
Requires-Dist: roboflow>=1.1; extra == 'train'
Description-Content-Type: text/markdown

# marksense

[![CI](https://github.com/RoyAbra27/marksense/actions/workflows/ci.yml/badge.svg)](https://github.com/RoyAbra27/marksense/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/marksense)](https://pypi.org/project/marksense/)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)

Read marked paper forms — bubble sheets, surveys, checklists, exams — into structured data
(JSON/CSV). Runs locally: no cloud, no account, no telemetry.

Point marksense at a scan or phone photo of a filled form plus a template you define once, and it
returns every answer with a confidence score:

```bash
marksense read examples/samples/quiz_filled_01.png -t examples/templates/quiz.json
```

```json
{
  "form_type": "quiz",
  "source": "examples/samples/quiz_filled_01.png",
  "answers": {
    "Q1": "A",
    "Q2": "D",
    "Q3": "D"
  }
}
```

_(output truncated — the full result carries all 20 answers plus per-question confidence,
multi-mark flags, and per-page alignment confidence)_

## Why marksense

- **Any layout.** Layout knowledge lives in a template JSON, not in code — checkboxes, bubbles,
  grids, multi-page forms, mixed mark types. Adding a form means writing JSON, never code.
- **Robust to real-world scans.** Every page is aligned onto the blank template (feature matching
  + ECC refinement) before detection, so skewed scans and phone photos read correctly.
- **Model-free by default, learned detection when you want it.** A pixel-density detector works
  out of the box with zero downloads; a small ONNX mark-detection model can be plugged in
  (`--model` / `--download`) for harder real-world scans.
- **Lean runtime.** onnxruntime, OpenCV, NumPy, PyMuPDF. No PyTorch, no GPU needed.

## Install

```bash
pip install marksense
```

## Quickstart

The repository ships a self-contained synthetic demo (generated by
[`examples/generate_samples.py`](examples/generate_samples.py)):

```bash
# Read one form -> JSON on stdout
marksense read examples/samples/quiz_filled_01.png -t examples/templates/quiz.json

# A whole stack -> one CSV row per form
marksense batch examples/samples/ -t examples/templates/survey.json -o results.csv

# Check a template you are authoring
marksense template validate examples/templates/quiz.json
```

Or from Python:

```python
from marksense import read_form

result = read_form("scan.jpg", template="my-form.json")
result.answers                # {"Q1": "3", ...}
result.confidence             # per-question confidence
result.multi_marked           # questions with more than one mark (review these)
result.to_csv()               # question,answer,confidence,flags
```

## Run as a service

```bash
pip install "marksense[service]"
marksense serve                       # http://127.0.0.1:8000, bundled demo templates
marksense serve --templates-dir ./my-templates --port 9000
```

Or with Docker:

```bash
docker build -t marksense .
docker run -p 8000:8000 -v ./my-templates:/templates marksense
```

Endpoints: `GET /health`, `GET /templates`, `POST /read?form_type=<name>` (multipart `file`).
Interactive docs at `/docs`.

## Reading your own forms

1. Get a clean image of the blank form (render the PDF or scan an empty copy).
2. Write a template JSON describing where each option is — see the
   [template authoring guide](docs/template-authoring.md).
3. `marksense template validate my-form.json`, then `marksense read`.

## How it works

```
input (PDF / image)
  └─ render pages ──> align to template ──> detect marks ──> map to answers
                      (ORB → SIFT → ECC)    (ONNX YOLO or     (nearest ROI +
                                             density fallback)  confidence)
```

The detector only knows two things: what a check looks like and what a circle looks like. All
form-specific knowledge — page sizes, question positions, answer values — is declarative template
JSON. That separation is what makes the engine general.

v0.1 uses the density detector by default — no downloads, fully offline. Learned-model weights
trained on public datasets ship in an upcoming release; pass `--download` /
`auto_download=True` to fetch them once published (cached under `~/.marksense/models/`), or
`--model path/to/weights.onnx` to use your own.

## Roadmap

- ~~Self-hosted REST service + Docker image~~ (shipped in v0.2)
- Published accuracy benchmarks
- Clean-provenance model weights (training pipeline and guide: `docs/training.md`)
- Template authoring helpers (auto-detect form regions)

## Development

```bash
git clone https://github.com/RoyAbra27/marksense
cd marksense
uv venv && uv pip install -e ".[dev]"
pytest            # full suite runs with no model file and no network
ruff check .
```

Design docs live in [`docs/design/`](docs/design/); start with
[0001-marksense-v1](docs/design/0001-marksense-v1.md).

## License

MIT — see [LICENSE](LICENSE).
