Metadata-Version: 2.4
Name: trowel
Version: 0.2.2
Summary: Premium excavation report drafting tool for CRM archaeologists
Author-email: Marcus Quinn <marcus@example.com>
License: MIT
Project-URL: homepage, https://github.com/mabo-du/trowel
Project-URL: repository, https://github.com/mabo-du/trowel
Project-URL: documentation, https://github.com/mabo-du/trowel/blob/main/docs/USER_GUIDE.md
Project-URL: changelog, https://github.com/mabo-du/trowel/blob/main/CHANGELOG.md
Project-URL: issues, https://github.com/mabo-du/trowel/issues
Keywords: archaeology,excavation,report,crm,heritage,stratigraphy,harris-matrix,cultural-resource-management,digital-heritage
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Text Processing :: Markup :: Markdown
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: pandas>=2.0
Requires-Dist: openpyxl>=3.1
Requires-Dist: jinja2>=3.1
Requires-Dist: python-docx>=1.0
Requires-Dist: PyQt6>=6.6
Provides-Extra: web
Requires-Dist: streamlit>=1.40; extra == "web"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-benchmark>=5.0; extra == "dev"
Requires-Dist: pytest-cov>=6.0; extra == "dev"
Requires-Dist: ruff>=0.8; extra == "dev"
Requires-Dist: pre-commit>=4.0; extra == "dev"
Provides-Extra: automap
Requires-Dist: matplotlib>=3.8; extra == "automap"
Requires-Dist: geopandas>=1.0; extra == "automap"
Provides-Extra: ai
Requires-Dist: openai>=1.0; extra == "ai"
Requires-Dist: anthropic>=0.30; extra == "ai"
Provides-Extra: hoard
Requires-Dist: hoard-erd; extra == "hoard"

# Trowel ⛏

> The archaeologist's essential tool for careful finishing work.

<div align="center">
  <img src="resources/icon-256.png" alt="Trowel app icon" width="128">
</div>

<div align="center">

[![Tests](https://github.com/mabo-du/trowel/actions/workflows/build.yml/badge.svg)](https://github.com/mabo-du/trowel/actions/workflows/build.yml)
[![PyPI version](https://img.shields.io/pypi/v/trowel?color=blue)](https://pypi.org/project/trowel/)
[![Python versions](https://img.shields.io/pypi/pyversions/trowel)](https://pypi.org/project/trowel/)
[![License](https://img.shields.io/github/license/mabo-du/trowel)](LICENSE)
[![Code style](https://img.shields.io/badge/code%20style-ruff-000000)](https://github.com/astral-sh/ruff)

</div>

A premium desktop application that transforms digital excavation data into compliance-ready archaeological reports — covering the full post-excavation lifecycle from field data to repository submission.

**No GPU required. No cloud dependency. Your data never leaves your machine.**

---

> **📢 Beta testers wanted** — If you're a CRM archaeologist, heritage
> consultant, or field director, we'd love your feedback.
> [Open a Discussion](https://github.com/mabo-du/trowel/discussions) or
> [file an issue](https://github.com/mabo-du/trowel/issues/new?template=bug_report.md).
> Your data never leaves your machine — no sign-up or account needed.

---

## What Trowel Does

Commercial archaeologists spend up to 40% of project budgets on post-excavation report writing. SHPO rejection rates reach 79% on spatial grounds alone. Grey literature backlogs exceed 425,000 reports. Trowel fixes this by:

- **Auto-generating prose** from structured field data using deterministic Natural Language Generation — no LLM hallucinations in your stratigraphy
- **Building compliance-ready reports** for UK (CIfA/MoRPHE), US (Section 106/NHPA), Australia (NSW Heritage), and generic frameworks
- **Bidirectional Harris Matrix editor** — drag context nodes to reorder the stratigraphic narrative in real time
- **Photo plate builder** — auto-generates plates from geotagged images with EXIF GPS, regulatory captions, and UTM coordinates
- **Programmatic site maps** — trench plans, feature distributions, finds heatmaps, and section profiles at 300 DPI
- **Field database connectors** — import directly from FAIMS Mobile, ARK, and Intrasis exports
- **FAIR-compliant archival export** — Dig Digital v1.2 DMP, Dublin Core XML, structured JSON data alongside PDF/A and DOCX
- **Interactive review workflows** — paragraph-level comments with source-data tracing back to the original database entries
- **AI-assisted NLG (optional)** — enhance prose via Ollama/OpenAI/Anthropic with full CIfA 2025 provenance tracking

## Quick Start

```bash
# Install from PyPI (recommended)
pip install trowel
trowel

# Or from source
git clone https://github.com/mabo-du/trowel.git
cd trowel
pip install -e .
trowel

# Or web UI (requires streamlit)
pip install trowel[web]
streamlit run $(python3 -c "import trowel; print(trowel.__file__)")/../app.py
```

Load `sample_data/synthetic_contexts.csv` to see it work in under a minute. See the [Quickstart](docs/QUICKSTART.md) for a 60-second walkthrough.

<div align="center">
  <img src="resources/screenshot-import.png" alt="Trowel import page" width="80%">
  <br>
  <em>The Trowel desktop app — import CSV/Excel data, connect to field databases, or load HOARD digitised context sheets</em>
</div>

<br>

<div align="center">
  <img src="resources/screenshot-preview.png" alt="Trowel report preview" width="80%">
  <br>
  <em>Report preview with 6 tabs — report sections, spatial maps, Harris Matrix, photo plates, AI tools, and peer review</em>
</div>

<br>

## Features

### Dual Interface
- **Desktop (PyQt6):** Native file dialogs, wizard-style workflow, dark Fusion theme, 6-tab report preview (Report / Map / Matrix / Photos / AI / Review) with section-level toggles
- **Web (Streamlit):** Browser-based alternative, deployable as a team tool, premium CSS design

### Data Ingestion
- CSV and Excel parsing with 70+ auto-detected column names (UK and US conventions)
- **HOARD JSON import** — open a directory of HOARD Phase 1 digitised context sheets; extracts contexts, finds, and samples automatically
- **Field database connectors** — import directly from FAIMS Mobile, ARK (Archaeological Recording Kit), and Intrasis export directories
- Validates stratigraphic logic: missing references, self-references, cut/fill consistency
- **Empty-project guard** — when a CSV lacks recognisable context records, shows a clear error instead of generating a fake-looking report
- **Quality gate** — detects when >90% of parsed contexts have no archaeological data and warns before generation
- Background-thread parsing keeps the UI responsive

### Report Generation
- **Deterministic NLG** — no LLM API calls, no GPU required
- **ROMFA inclusion scale** — frequent charcoal, occasional CBM, rare flecks, all correctly expanded
- **Soil texture vocabulary** — silty clay ≠ clayey silt (geologically precise, never treated as synonyms)
- **Controlled period labels** — Iron Age, Romano-British, post-medieval, etc.
- **Section-by-section preview** — toggle sections on/off, see live updates
- **12 report sections** — frontmatter, introduction, methodology, stratigraphy, finds catalogue, discussion, specialist assessments, archive, site photographs, site maps, and AI disclosure

### Jurisdiction Templates

| Jurisdiction | Standard | Key Sections |
|---|---|---|
| **UK** | CIfA Standard & Guidance / MoRPHE | Non-technical summary (NGR/OASIS), MoLAS recording methodology, Type 2 Appraisal with UPD, AAF-compliant archive deposition |
| **US** | Section 106 (NHPA) / SHPO | SHPO cover page with legal description, NRHP eligibility evaluation (criteria A-D + integrity), shovel test methodology, 36 CFR 79 curation |
| **Australia** | NSW Heritage Guidelines | Burra Charter-aligned significance assessment, graded zones, five prescribed management outcomes, Aboriginal cultural heritage acknowledgement |

### Spatial-Text Engine
- Load GeoJSON, shapefiles, or GeoPackage on the import page
- Template writers use `{{ spatial.acreage("Trench 1") }}`, `{{ spatial.distance("F1", "F2") }}`, `{{ spatial.centroid_utm("Feature A") }}`
- Interactive map preview tab with zoom, pan, and feature hover
- Auto-updates when shapefiles change (e.g., client APE revision)

### Interactive Harris Matrix
- Visual DAG editor built into the report preview, colour-coded by context type
- Drag nodes to reorder; the stratigraphic narrative text re-aligns automatically
- Kahn's algorithm topological ordering — detects cycles in stratigraphic relationships
- EEDP export for StratiGraph ecosystem interop

### Photo Plate Builder
- Ingests geotagged images from a site photo directory
- Extracts EXIF GPS coordinates, timestamps, and camera orientation via Pillow
- Auto-generates regulatory captions: UTM coordinates, orientation, context cross-references
- Thumbnail grid preview with editable captions
- A4-friendly plate layout (up to 6 images per plate)

### Programmatic Site Maps (QGIS-free)
- **Trench / Feature Plan** — site layout colour-coded by context type
- **Feature Distribution** — scatter plot by context category
- **Artefact Density Heatmap** — hexbin finds density with colour bar
- **Section Profile** — depth transect with labelled stratigraphic units
- All maps at 300 DPI PNG, dark theme, auto-embedded in reports
- Requires matplotlib + geopandas (optional, graceful fallback)

### Export Formats
- **Editable DOCX** — your company template, ready for PI review
- **Archival PDF/A-2b** — ready for HER deposition
- **Plain Markdown** — version-control friendly, universal
- **Harris Matrix SVG** — auto-generated from context relationships (no HOARD dependency)
- **FAIR-Compliant Archive** — structured package with:
  - Report formats (PDF/A, DOCX, Markdown)
  - Structured JSON data (contexts, finds, samples, spatial)
  - Dublin Core XML metadata (OAI-PMH)
  - Data Management Plan (Dig Digital v1.2 structure)
  - CIfA 2025 AI disclosure appendix
  - Harris Matrix SVG + auto-generated site maps
  - Archive manifest with file listing
- **FAIR archive** export accessible from the preview toolbar, buttons for individual formats alongside the archive bundle
- **AI disclosure appendix** auto-generated and included when AI NLG output has been approved

### AI-Assisted NLG (Optional)
- Enhance generated prose via local Ollama, OpenAI, or Anthropic APIs
- Prompt-chain architecture: deterministic draft → structured context → LLM enhancement
- **Per-sentence provenance tracking** — every AI-generated sentence stores which source fields prompted it, the model used, and a timestamp
- CIfA 2025-compliant: AI disclosure appendix declares systems, sections, and validation status
- Human validation gate: AI output is visually distinct until approved
- Fully opt-in — disabled by default, no data ever leaves your machine unless you configure a remote API

### Interactive Review Workflows
- **Review mode** toggle in the preview pane
- Click any paragraph to add a comment with source-data tracing
- Source-data panel shows which contexts, finds, and samples generated the selected text
- Comments are persisted in `.trowel` project files alongside the data
- Export review summary as Markdown for distribution

### Keyboard Shortcuts

| Shortcut | Action |
|----------|--------|
| `Ctrl+O` | Open .trowel project file |
| `Ctrl+S` | Save project |
| `Ctrl+Shift+S` | Save project as... |
| `Ctrl+Shift+O` | Import CSV/Excel data |
| `Ctrl+H` | Open HOARD project directory |
| `Ctrl+N` | New project (clear session) |
| `Ctrl+Q` | Quit |

---

## The StratiGraph Ecosystem

Trowel is part of a suite of open-source tools for digital heritage and archaeology:

| Tool | Repository | Role |
|---|---|---|
| **Trowel** | *(this repo)* | Report drafting from digital field data |
| **[HOARD](https://github.com/mabo-du/HOARD)** | Heritage Observation And Report Drafter | Paper + photo digitisation pipeline (OCR, VLM captioning, spatial reconstruction). Use when starting from raw scans and handwritten sheets. |
| **[StratiGraph](https://github.com/mabo-du/stratigraph)** | Harris Matrix generator | Interactive DAG editor for stratigraphic sequences. Exports EEDP paths for hallucination-free AI report generation. |
| **[Libby](https://github.com/mabo-du/libby)** | Radiocarbon calibration | Bayesian age-depth modelling, calibration curve rendering, marine reservoir correction. |
| **[Paleo](https://github.com/mabo-du/paleo)** | Palaeontology AI platform | Fossil identification, paleoclimate reconstruction, palaeogeographic mapping. |
| **[dibble](https://github.com/mabo-du/dibble)** | Lithic analysis | Automated 3D stone tool measurement, photogrammetry pipeline, AI classification. |
| **[Fritts](https://github.com/mabo-du/fritts)** | Dendrochronology | Tree-ring cross-dating, master chronology building, image ring measurement. |

### How Trowel Integrates

```
┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   HOARD      │────▶│    Trowel    │────▶│ StratiGraph  │
│ paper→digital│     │ digital→draft│     │ strat→DAG    │
└──────────────┘     └──────┬───────┘     └──────┬───────┘
                            │                    │
                     ┌──────▼───────┐     ┌──────▼───────┐
                     │    Libby     │     │    Trowel    │
                     │  C14 dates   │     │  EEDP paths  │
                     └──────────────┘     └──────────────┘
```

- **HOARD → Trowel:** Share `context-sheet-v1.json` schema. HOARD digitises paper; Trowel picks up the structured JSON and generates the report.
- **Trowel → StratiGraph:** Share the same context data model. StratiGraph visualises the matrix; Trowel consumes EEDP paths for deterministic stratigraphic narratives.
- **Libby → Trowel:** Radiocarbon dates from Libby flow into Trowel's dating sections and specialist appendices.
- **Trowel's HOARD integration:** When `hoard-erd` is installed, Trowel uses HOARD's premium `docx_writer` (cover pages, styled headings, appendix tables), `pdf_writer` (PDF/A-2b archival format), and `harris.py` (matrix SVG generation). Falls back gracefully when HOARD is absent.

---

## Architecture

```
src/
├── __init__.py         # Package marker
├── main.py             # PyQt6 desktop entry point
├── app.py              # Streamlit web entry point
├── models.py           # Context, Find, Sample, ProjectData dataclasses
├── ingest.py           # CSV/Excel parsing, 70+ column aliases, validation
├── vocabulary.py       # ROMFA scale, soil textures, controlled terminology
├── nlg.py              # Deterministic NLG engine + Jinja2 section templates
├── nlg_ai.py           # AI NLG (3 backends: Ollama, OpenAI, Anthropic) + provenance
├── export.py           # Markdown, DOCX (HOARD or fallback), PDF/A, archive manifest
├── eedp.py             # StratiGraph EEDP integration for strat narratives
├── harris_editor.py    # Harris Matrix DAG model (Kahn's, cycle detection, EEDP)
├── spatial_text.py     # Spatial-Text query engine (acreage, distance, UTM, GeoJSON)
├── automap.py          # QGIS-free site maps (trench, feature, hexbin, section)
├── images.py           # Photo plate builder (EXIF GPS, auto-captions, A4 plates)
├── compliance.py       # DMP (Dig Digital v1.2), Dublin Core XML, AI disclosure
├── review.py           # ReviewComment/ReviewSession models, source-data tracing
├── hoard_import.py     # HOARD JSON context-sheet importer
├── trowel_io.py        # .trowel project file serialisation (JSON)
├── connectors/         # Field database connector registry
│   ├── __init__.py
│   ├── base.py         # Abstract base FieldConnector class
│   ├── faims.py        # FAIMS Mobile export directory connector
│   ├── ark.py          # ARK export directory connector
│   └── intrasis.py     # Intrasis export directory connector
├── ui/                 # PyQt6 desktop UI package
│   ├── __init__.py
│   ├── theme.py        # Dark Fusion theme (QPalette + stylesheet)
│   ├── session.py      # Reactive QObject-based data store
│   ├── main_window.py  # QMainWindow with QStackedWidget, 6-tab preview
│   ├── import_page.py  # File selection, project metadata, DB connector dialog
│   ├── preview_page.py # 6-tab report preview (Report/Map/Matrix/Photos/AI/Review)
│   ├── map_pane.py     # QPainter spatial map preview tab
│   ├── matrix_widget.py# QGraphicsView colour-coded DAG editor tab
│   ├── plate_view.py   # Thumbnail grid photo plate editor tab
│   ├── ai_panel.py     # AI NLG controls, provenance viewer, backend config
│   ├── review_panel.py # Review comment sidebar, source-data tracing
│   └── connector_dialog.py # Unified DB connector connection dialog
├── templates/          # Jinja2 report section templates (6 sections × 4 jurisdictions)
│   ├── generic/        # frontmatter, intro, methodology, discussion, specialist, archive
│   ├── uk/             # CIfA/MoRPHE overrides
│   ├── us/             # Section 106 overrides
│   └── au/             # NSW Heritage overrides
sample_data/            # Synthetic (37 ctx + 28 finds + 10 samples), original (12 ctx), GeoJSON
tests/                  # 167 unit tests + 30 integration tests (197 total)
```

## Sample Data

Two example datasets are provided in `sample_data/`:

**Synthetic dataset** (recommended for first use): `synthetic_contexts.csv` — 37 contexts across 5 phases, designed to exercise all jurisdiction templates and edge cases:

| Phase | Features |
|---|---|
| Phase 1 — Natural | River terrace gravels |
| Phase 2 — Iron Age (800 BC–AD 43) | Enclosure ditch with 3 fills, roundhouse ring-groove, central posthole with in-situ burning, occupation layer |
| Phase 3 — Roman (AD 43–410) | Stone building with opus signinum floor, limestone walls, clay floor, hearth, demolition layer, quarry pit with 3 fills, inhumation burial |
| Phase 4 — Medieval (1066–1550) | Cultivation horizon, drainage ditch, rubbish pit with dense artefact assemblage |
| Phase 5 — Post-Med/Modern | Ploughsoil, modern topsoil with 20th-century inclusions |

Plus edge cases: context with interpretation only (998), completely empty context (999), finds referencing non-existent contexts. Also includes `synthetic_finds.csv` (28 finds) and `synthetic_samples.csv` (10 samples).

**Quick demo:** Load `sample_data/synthetic_contexts.csv`, add the finds and samples files, select UK jurisdiction, and preview all sections.

**Original demo:** `contexts.csv` — 12-context Iron Age / Roman site with 12 finds and 5 samples.

---

## Requirements

- Python 3.11+
- PyQt6 (desktop UI)
- pandas, openpyxl, jinja2, python-docx, Pillow (core engine)
- matplotlib, geopandas, cartopy (programmatic site maps; `pip install trowel[automap]`)
- openai, anthropic (AI NLG backends; `pip install trowel[ai]`)
- Streamlit (web UI; `pip install trowel[web]`)
- hoard-erd (premium DOCX/PDF/A export; `pip install hoard-erd`)

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for the development workflow, code style guide, testing, and PR checklist. All contributions are welcome.

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for version history.

## Development

### Setup

```bash
pip install -e ".[dev]"

# Install pre-commit hooks (ruff check + format on every commit)
pre-commit install
```

### Lint & Format

```bash
ruff check src/ tests/
ruff format src/ tests/ --check
```

### Run Tests

```bash
pytest -v
```

All three — lint, format check, and tests — run in CI on every push and pull request to `main` across Linux, Windows, and macOS (Python 3.11–3.13).

## Packaging (Standalone Executable)

Trowel can be packaged as a standalone executable so users don't need Python installed.

```bash
pip install pyinstaller
make build          # Linux
make build-windows  # on Windows
make build-macos    # on macOS
```

The output is in `dist/Trowel/` — a single folder you can zip and distribute. Double-click `Trowel` (or `Trowel.exe` on Windows) to launch.

GitHub Actions runs lint, tests, and builds standalone executables for Linux, Windows, and macOS on every push to `main`. Download the artifacts from the Actions tab.

## Project File Format

Trowel saves and loads projects in `.trowel` format — a JSON file containing all excavation data and UI state. Use **File → Save** (Ctrl+S) and **File → Open** (Ctrl+O) to persist your work.

## License

MIT — use it, modify it, ship it. Archaeology deserves better tools, and they should be free.
