Metadata-Version: 2.4
Name: trowel
Version: 0.3.0
Summary: Premium excavation report drafting tool for CRM archaeologists
Author-email: Marcus Quinn <marcus@example.com>
License: MIT
Project-URL: homepage, https://github.com/mabo-du/trowel
Project-URL: repository, https://github.com/mabo-du/trowel
Project-URL: documentation, https://github.com/mabo-du/trowel/blob/main/docs/USER_GUIDE.md
Project-URL: changelog, https://github.com/mabo-du/trowel/blob/main/CHANGELOG.md
Project-URL: issues, https://github.com/mabo-du/trowel/issues
Keywords: archaeology,excavation,report,crm,heritage,stratigraphy,harris-matrix,cultural-resource-management,digital-heritage
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Text Processing :: Markup :: Markdown
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: pandas>=2.0
Requires-Dist: openpyxl>=3.1
Requires-Dist: jinja2>=3.1
Requires-Dist: python-docx>=1.0
Requires-Dist: PyQt6>=6.6
Provides-Extra: web
Requires-Dist: streamlit>=1.40; extra == "web"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-benchmark>=5.0; extra == "dev"
Requires-Dist: pytest-cov>=6.0; extra == "dev"
Requires-Dist: ruff>=0.8; extra == "dev"
Requires-Dist: pre-commit>=4.0; extra == "dev"
Provides-Extra: automap
Requires-Dist: matplotlib>=3.8; extra == "automap"
Requires-Dist: geopandas>=1.0; extra == "automap"
Provides-Extra: ai
Requires-Dist: openai>=1.0; extra == "ai"
Requires-Dist: anthropic>=0.30; extra == "ai"
Provides-Extra: hoard
Requires-Dist: hoard-erd; extra == "hoard"
Provides-Extra: c14
Requires-Dist: libby; extra == "c14"
Requires-Dist: iosacal>=0.6.0; extra == "c14"
Requires-Dist: numpy>=2.0; extra == "c14"

# Trowel ⛏

> The archaeologist's essential tool for careful finishing work.

<div align="center">
  <img src="resources/icon-256.png" alt="Trowel app icon" width="128">
</div>

<div align="center">

[![Tests](https://github.com/mabo-du/trowel/actions/workflows/build.yml/badge.svg)](https://github.com/mabo-du/trowel/actions/workflows/build.yml)
[![PyPI version](https://img.shields.io/pypi/v/trowel?color=blue)](https://pypi.org/project/trowel/)
[![Python versions](https://img.shields.io/pypi/pyversions/trowel)](https://pypi.org/project/trowel/)
[![License](https://img.shields.io/github/license/mabo-du/trowel)](LICENSE)
[![Code style](https://img.shields.io/badge/code%20style-ruff-000000)](https://github.com/astral-sh/ruff)

</div>

A premium desktop application that transforms digital excavation data into compliance-ready archaeological reports — covering the full post-excavation lifecycle from field data to repository submission.

**No GPU required. No cloud dependency. Your data never leaves your machine.**

---

> **📢 Beta testers wanted** — If you're a CRM archaeologist, heritage
> consultant, or field director, we'd love your feedback.
> [Open a Discussion](https://github.com/mabo-du/trowel/discussions) or
> [file an issue](https://github.com/mabo-du/trowel/issues/new?template=bug_report.md).
> Your data never leaves your machine — no sign-up or account needed.

---

## What Trowel Does

Commercial archaeologists spend up to 40% of project budgets on post-excavation report writing. SHPO rejection rates reach 79% on spatial grounds alone. Grey literature backlogs exceed 425,000 reports. Trowel fixes this by:

- **Auto-generating prose** from structured field data using deterministic Natural Language Generation — no LLM hallucinations in your stratigraphy
- **Building compliance-ready reports** for 19 jurisdictions: UK (CIfA/MoRPHE), US (Section 106/NHPA + 5 SHPO state configs), Australia (NSW Heritage + Burra Charter), Scotland, Wales, Ireland, Netherlands, France, Germany, Canada-Ontario, New Zealand, South Africa
- **Structured report model** with provenance tracking — every paragraph links to its source data record (GIS feature, context sheet, photo, find). Enables bidirectional tracing and audit trails per CIfA 2025 AI guidelines
- **`\label{}`/`\ref{}` cross-referencing** — labels auto-resolve to "Section 1", "Figure 2" etc.; stripped in Markdown, live OOXML bookmarks in DOCX that survive manual Word edits
- **Bidirectional Harris Matrix editor** — drag context nodes to reorder the stratigraphic narrative in real time; move paragraphs to update the matrix DAG (cycle-safe with automatic rollback)
- **Queue-and-batch change management** — spatial and matrix changes produce a diff queue the author reviews and applies/dismisses as a batch
- **Photo plate builder** — auto-generates plates from geotagged images with EXIF GPS, regulatory captions, and UTM coordinates
- **Programmatic site maps** — trench plans, feature distributions, finds heatmaps, and section profiles at 300 DPI
- **Field database connectors** — import directly from FAIMS Mobile, ARK, and Intrasis exports
- **Radiocarbon calibration** — calibrate C14 dates via Libby integration (IntCal20). Generates BC/AD labels, HPD ranges, bimodal detection, Hallstatt plateau warnings
- **FAIR-compliant archival export** — Dig Digital v1.2 DMP, Dublin Core XML, structured JSON data alongside PDF/A and DOCX
- **Offline Getty vocabulary lookup** — standardise material and period names via Cache & Carry's AAT/ULAN/TGN index
- **Lithic analysis appendix** — import Dibble CSV/JSON, generate measurement tables and statistical summaries
- **Interactive review workflows** — paragraph-level comments with source-data tracing back to the original database entries
- **AI-assisted NLG (optional)** — enhance prose via Ollama/OpenAI/Anthropic with full CIfA 2025 provenance tracking

## Quick Start

```bash
# Install from PyPI (recommended)
pip install trowel
trowel

# Or from source
git clone https://github.com/mabo-du/trowel.git
cd trowel
pip install -e .
trowel

# Or web UI (requires streamlit)
pip install trowel[web]
streamlit run $(python3 -c "import trowel; print(trowel.__file__)")/../app.py
```

Load `sample_data/synthetic_contexts.csv` to see it work in under a minute. See the [Quickstart](docs/QUICKSTART.md) for a 60-second walkthrough.

<div align="center">
  <img src="resources/screenshot-import.png" alt="Trowel import page" width="80%">
  <br>
  <em>The Trowel desktop app — import CSV/Excel data, connect to field databases, or load HOARD digitised context sheets</em>
</div>

<br>

<div align="center">
  <img src="resources/screenshot-preview.png" alt="Trowel report preview" width="80%">
  <br>
  <em>Report preview with 6 tabs — report sections, spatial maps, Harris Matrix, photo plates, AI tools, and peer review</em>
</div>

<br>

## Features

### Dual Interface
- **Desktop (PyQt6):** Native file dialogs, wizard-style workflow, dark Fusion theme, 6-tab report preview (Report / Map / Matrix / Photos / AI / Review) with section-level toggles
- **Web (Streamlit):** Browser-based alternative, deployable as a team tool, premium CSS design

### Data Ingestion
- CSV and Excel parsing with 70+ auto-detected column names (UK and US conventions)
- **HOARD JSON import** — open a directory of HOARD Phase 1 digitised context sheets; extracts contexts, finds, and samples automatically
- **Field database connectors** — import directly from FAIMS Mobile, ARK (Archaeological Recording Kit), and Intrasis export directories
- Validates stratigraphic logic: missing references, self-references, cut/fill consistency
- **Empty-project guard** — when a CSV lacks recognisable context records, shows a clear error instead of generating a fake-looking report
- **Quality gate** — detects when >90% of parsed contexts have no archaeological data and warns before generation
- Background-thread parsing keeps the UI responsive

### Report Generation
- **Deterministic NLG** — no LLM API calls, no GPU required
- **ROMFA inclusion scale** — frequent charcoal, occasional CBM, rare flecks, all correctly expanded
- **Soil texture vocabulary** — silty clay ≠ clayey silt (geologically precise, never treated as synonyms)
- **Controlled period labels** — Iron Age, Romano-British, post-medieval, etc.
- **Section-by-section preview** — toggle sections on/off, see live updates
- **12 report sections** — frontmatter, introduction, methodology, stratigraphy, finds catalogue, discussion, specialist assessments, archive, site photographs, site maps, and AI disclosure

### Jurisdiction Templates

| Jurisdiction | Standard | Config |
|---|---|---|---|
| **UK** | CIfA Standard & Guidance / MoRPHE | `uk.json` |
| **US** | Section 106 (NHPA) | `us.json` |
| **US — California** | OHP DPR 523 forms | `us_ca.json` |
| **US — Texas** | THC/TARL TexSite | `us_tx.json` |
| **US — Arizona** | ASM/SHPO | `us_az.json` |
| **US — Colorado** | OAHP | `us_co.json` |
| **US — North Carolina** | OSA | `us_nc.json` |
| **Scotland** | HES Data Structure Report | `scotland.json` |
| **Wales** | RCAHMW/Cadw | `wales.json` |
| **Ireland** | NMS Section 26 | `ireland.json` |
| **Australia (NSW)** | NSW Heritage Act 1977 | `au.json` |
| **Australia (Burra)** | ICOMOS Burra Charter | `australia_burra.json` |
| **Netherlands** | KNA 5.0 | `netherlands.json` |
| **France** | INRAP / Code du Patrimoine | `france.json` |
| **Germany** | Landesdenkmalpflege | `germany.json` |
| **Canada — Ontario** | MCM Standards & Guidelines | `canada_ontario.json` |
| **New Zealand** | Heritage NZ Pouhere Taonga | `new_zealand.json` |
| **South Africa** | SAHRA | `south_africa.json` |
| **Generic** | International fallback | `generic.json` |

### Spatial-Text Engine
- Load GeoJSON, shapefiles, or GeoPackage on the import page
- Template writers use `{{ spatial.acreage("Trench 1") }}`, `{{ spatial.distance("F1", "F2") }}`, `{{ spatial.centroid_utm("Feature A") }}`
- Interactive map preview tab with zoom, pan, and feature hover
- Auto-updates when shapefiles change (e.g., client APE revision)

### Interactive Harris Matrix
- Visual DAG editor built into the report preview, colour-coded by context type
- Drag nodes to reorder; the stratigraphic narrative text re-aligns automatically
- Kahn's algorithm topological ordering — detects cycles in stratigraphic relationships
- EEDP export for StratiGraph ecosystem interop

### Photo Plate Builder
- Ingests geotagged images from a site photo directory
- Extracts EXIF GPS coordinates, timestamps, and camera orientation via Pillow
- Auto-generates regulatory captions: UTM coordinates, orientation, context cross-references
- Thumbnail grid preview with editable captions
- A4-friendly plate layout (up to 6 images per plate)

### Programmatic Site Maps (QGIS-free)
- **Trench / Feature Plan** — site layout colour-coded by context type
- **Feature Distribution** — scatter plot by context category
- **Artefact Density Heatmap** — hexbin finds density with colour bar
- **Section Profile** — depth transect with labelled stratigraphic units
- All maps at 300 DPI PNG, dark theme, auto-embedded in reports
- Requires matplotlib + geopandas (optional, graceful fallback)

### Export Formats
- **Editable DOCX** — your company template, ready for PI review
- **Archival PDF/A-2b** — ready for HER deposition
- **Plain Markdown** — version-control friendly, universal
- **Harris Matrix SVG** — auto-generated from context relationships (no HOARD dependency)
- **FAIR-Compliant Archive** — structured package with:
  - Report formats (PDF/A, DOCX, Markdown)
  - Structured JSON data (contexts, finds, samples, spatial)
  - Dublin Core XML metadata (OAI-PMH)
  - Data Management Plan (Dig Digital v1.2 structure)
  - CIfA 2025 AI disclosure appendix
  - Harris Matrix SVG + auto-generated site maps
  - Archive manifest with file listing
- **FAIR archive** export accessible from the preview toolbar, buttons for individual formats alongside the archive bundle
- **AI disclosure appendix** auto-generated and included when AI NLG output has been approved

### AI-Assisted NLG (Optional)
- Enhance generated prose via local Ollama, OpenAI, or Anthropic APIs
- Prompt-chain architecture: deterministic draft → structured context → LLM enhancement
- **Per-sentence provenance tracking** — every AI-generated sentence stores which source fields prompted it, the model used, and a timestamp
- CIfA 2025-compliant: AI disclosure appendix declares systems, sections, and validation status
- Human validation gate: AI output is visually distinct until approved
- Fully opt-in — disabled by default, no data ever leaves your machine unless you configure a remote API

### Interactive Review Workflows
- **Review mode** toggle in the preview pane
- Click any paragraph to add a comment with source-data tracing
- Source-data panel shows which contexts, finds, and samples generated the selected text
- Comments are persisted in `.trowel` project files alongside the data
- Export review summary as Markdown for distribution

### Keyboard Shortcuts

| Shortcut | Action |
|----------|--------|
| `Ctrl+O` | Open .trowel project file |
| `Ctrl+S` | Save project |
| `Ctrl+Shift+S` | Save project as... |
| `Ctrl+Shift+O` | Import CSV/Excel data |
| `Ctrl+H` | Open HOARD project directory |
| `Ctrl+N` | New project (clear session) |
| `Ctrl+Q` | Quit |

---

## The StratiGraph Ecosystem

Trowel is part of a suite of open-source tools for digital heritage and archaeology:

| Tool | Repository | Role | Trowel Integration |
|---|---|---|---|---|
| **Trowel** | *(this repo)* | Report drafting from digital field data | — |
| **[HOARD](https://github.com/mabo-du/HOARD)** | Heritage Observation And Report Drafter | Paper + photo digitisation pipeline | `hoard_import.py` / `hoard_export.py` — bidirectional JSON format; HOARD's docx_writer for premium DOCX |
| **[StratiGraph](https://github.com/mabo-du/stratigraph)** | Harris Matrix generator | Interactive DAG editor | `harris_editor.py` → EEDP export; `stratigraph_import.py` ← HMDP import |
| **[Libby](https://github.com/mabo-du/libby)** | Radiocarbon calibration | IntCal20 calibration engine | `radiocarbon.py` — wraps Libby's iosacal-based calibration |
| **[dibble](https://github.com/mabo-du/dibble)** | Lithic analysis | 3D stone tool measurement | `lithics.py` — imports Dibble CSV/JSON, generates specialist appendix |
| **Cache & Carry** | [github.com/mabo-du/cache-and-carry](https://github.com/mabo-du/cache-and-carry) | Offline CMS with Getty vocabularies | `vocab_terms.py` — reads SQLite for AAT/ULAN/TGN offline term lookup |
| **[Paleo](https://github.com/mabo-du/paleo)** | Palaeontology AI platform | Fossil identification, paleoclimate reconstruction | Adjacent domain |
| **[Fritts](https://github.com/mabo-du/fritts)** | Dendrochronology | Tree-ring cross-dating | Adjacent domain |

### How Trowel Integrates

```
                    ┌──────────────┐
                    │   Libby      │ ← C14 calibration
                    │  (Python)    │
                    └──────┬───────┘
                           │ calibrated dates, SPD curves
                           ▼
┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│   HOARD      │◀──▶│   Trowel     │◀──▶│ StratiGraph  │
│ paper→digital│    │ digital→draft│    │   HMDP DAG   │
└──────────────┘    └──────┬───────┘    └──────────────┘
                           │
                    ┌──────┴───────┐
                    │   Dibble     │ ← lithic measurements
                    │  (Python)    │
                    └──────────────┘
                    ┌─────────────────┐
                    │ Cache & Carry   │ ← Getty AAT/ULAN/TGN
                    │  (Tauri/Rust)   │
                    └─────────────────┘
```

- **HOARD ↔ Trowel:** Bidirectional — `hoard_import.py` reads digitised context sheets; `hoard_export.py` writes HOARD-compatible JSON for Phase 5 processing.
- **Trowel ↔ StratiGraph:** Bidirectional — `harris_editor.py:to_eedp_dict()` exports EEDP paths; `stratigraph_import.py:hmdp_to_model()` imports HMDP matrices.
- **Libby → Trowel:** `radiocarbon.py` wraps Libby's iosacal calibration for C14 dates in specialist appendices.
- **Dibble → Trowel:** `lithics.py` imports CSV/JSON measurement output and classification results.
- **Cache & Carry → Trowel:** `vocab_terms.py` reads the offline SQLite vocabulary database for material/period standardisation.
- **Trowel → QField:** `qfield_schema.py` generates CSV templates matching Trowel's ingest aliases for mobile field collection.

---

## Architecture

```
src/
├── __init__.py         # Package marker
├── main.py             # PyQt6 desktop entry point
├── app.py              # Streamlit web entry point
├── models.py           # Context, Find, Sample, C14Date, ProjectData dataclasses
├── ingest.py           # CSV/Excel parsing, 70+ column aliases, validation
├── vocabulary.py       # ROMFA scale, soil textures, controlled terminology
├── vocab_terms.py      # Getty AAT/ULAN/TGN offline lookup via Cache & Carry
├── nlg.py              # Deterministic NLG engine + Jinja2 section templates
├── nlg_ai.py           # AI NLG (3 backends: Ollama, OpenAI, Anthropic) + provenance
├── report_model.py     # Structured report model with provenance tracking
├── export.py           # Markdown, DOCX (plus OOXML cross-refs), PDF/A, archive
├── eedp.py             # StratiGraph EEDP integration for strat narratives
├── harris_editor.py    # Bidirectional Harris Matrix DAG + semantic diffing
├── spatial_text.py     # Spatial-Text query engine (acreage, distance, UTM)
├── radiocarbon.py      # Libby C14 calibration integration
├── lithics.py          # Dibble lithic analysis specialist appendix
├── hoard_import.py     # HOARD JSON context-sheet import
├── hoard_export.py     # Trowel → HOARD JSON export (bidirectional)
├── stratigraph_import.py # StratiGraph HMDP import for matrix sync
├── qfield_schema.py    # QField mobile field data collection templates
├── automap.py          # QGIS-free site maps (trench, feature, hexbin, section)
├── images.py           # Photo plate builder (EXIF GPS, auto-captions)
├── compliance.py       # DMP (Dig Digital v1.2), Dublin Core XML, AI disclosure
├── review.py           # ReviewComment/ReviewSession models, source-data tracing
├── trowel_io.py        # .trowel project file serialisation (JSON)
├── jurisdictions/      # 19 config-driven jurisdiction definitions
│   ├── __init__.py     # JurisdictionConfig loader, list_jurisdictions()
│   ├── uk.json, us.json, au.json, generic.json
│   ├── us_ca.json, us_tx.json, us_az.json, us_co.json, us_nc.json
│   ├── scotland.json, wales.json, ireland.json
│   ├── netherlands.json, france.json, germany.json
│   ├── canada_ontario.json, australia_burra.json
│   ├── new_zealand.json, south_africa.json
├── connectors/         # Field database connector registry
│   ├── __init__.py, base.py, faims.py, ark.py, intrasis.py
├── ui/                 # PyQt6 desktop UI package
│   ├── __init__.py, theme.py, session.py, main_window.py
│   ├── import_page.py, preview_page.py, map_pane.py
│   ├── matrix_widget.py, plate_view.py, ai_panel.py
│   ├── review_panel.py, connector_dialog.py
├── templates/          # Jinja2 report section templates (6 sections × 19 jurisdictions)
│   ├── generic/, uk/, us/, au/
sample_data/            # Synthetic (37 ctx + 28 finds + 10 samples), GeoJSON
schemas/                # report.schema.json for structured report validation
tests/                  # 373 unit + integration tests
```

## Sample Data

Two example datasets are provided in `sample_data/`:

**Synthetic dataset** (recommended for first use): `synthetic_contexts.csv` — 37 contexts across 5 phases, designed to exercise all jurisdiction templates and edge cases:

| Phase | Features |
|---|---|
| Phase 1 — Natural | River terrace gravels |
| Phase 2 — Iron Age (800 BC–AD 43) | Enclosure ditch with 3 fills, roundhouse ring-groove, central posthole with in-situ burning, occupation layer |
| Phase 3 — Roman (AD 43–410) | Stone building with opus signinum floor, limestone walls, clay floor, hearth, demolition layer, quarry pit with 3 fills, inhumation burial |
| Phase 4 — Medieval (1066–1550) | Cultivation horizon, drainage ditch, rubbish pit with dense artefact assemblage |
| Phase 5 — Post-Med/Modern | Ploughsoil, modern topsoil with 20th-century inclusions |

Plus edge cases: context with interpretation only (998), completely empty context (999), finds referencing non-existent contexts. Also includes `synthetic_finds.csv` (28 finds) and `synthetic_samples.csv` (10 samples).

**Quick demo:** Load `sample_data/synthetic_contexts.csv`, add the finds and samples files, select UK jurisdiction, and preview all sections.

**Original demo:** `contexts.csv` — 12-context Iron Age / Roman site with 12 finds and 5 samples.

---

## Requirements

- Python 3.11+
- PyQt6 (desktop UI)
- pandas, openpyxl, jinja2, python-docx, Pillow (core engine)
- matplotlib, geopandas, cartopy (programmatic site maps; `pip install trowel[automap]`)
- openai, anthropic (AI NLG backends; `pip install trowel[ai]`)
- Streamlit (web UI; `pip install trowel[web]`)
- hoard-erd (premium DOCX/PDF/A export; `pip install hoard-erd`)
- libby, iosacal, numpy (C14 calibration; `pip install trowel[c14]`) (optional)

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for the development workflow, code style guide, testing, and PR checklist. All contributions are welcome.

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for version history.

## Development

### Setup

```bash
pip install -e ".[dev]"

# Install pre-commit hooks (ruff check + format on every commit)
pre-commit install
```

### Lint & Format

```bash
ruff check src/ tests/
ruff format src/ tests/ --check
```

### Run Tests

```bash
pytest -v
```

All three — lint, format check, and tests — run in CI on every push and pull request to `main` across Linux, Windows, and macOS (Python 3.11–3.13).

## Packaging (Standalone Executable)

Trowel can be packaged as a standalone executable so users don't need Python installed.

```bash
pip install pyinstaller
make build          # Linux
make build-windows  # on Windows
make build-macos    # on macOS
```

The output is in `dist/Trowel/` — a single folder you can zip and distribute. Double-click `Trowel` (or `Trowel.exe` on Windows) to launch.

GitHub Actions runs lint, tests, and builds standalone executables for Linux, Windows, and macOS on every push to `main`. Download the artifacts from the Actions tab.

## Project File Format

Trowel saves and loads projects in `.trowel` format — a JSON file containing all excavation data and UI state. Use **File → Save** (Ctrl+S) and **File → Open** (Ctrl+O) to persist your work.

## License

MIT — use it, modify it, ship it. Archaeology deserves better tools, and they should be free.
