Metadata-Version: 2.4
Name: openetruscan
Version: 0.1.0
Summary: Open-source tools for ancient epigraphy — built for Etruscan, designed to be copied.
Project-URL: Homepage, https://github.com/open-etruscan/openetruscan
Project-URL: Documentation, https://open-etruscan.github.io/openetruscan
Project-URL: Repository, https://github.com/open-etruscan/openetruscan
Project-URL: Issues, https://github.com/open-etruscan/openetruscan/issues
Author: OpenEtruscan Contributors
License: MIT
License-File: LICENSE
Keywords: ancient-languages,digital-humanities,epidoc,epigraphy,etruscan,normalization,old-italic,tei
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.10
Requires-Dist: click>=8.0
Requires-Dist: pyyaml>=6.0
Provides-Extra: all
Requires-Dist: lxml>=4.9; extra == 'all'
Requires-Dist: networkx>=3.0; extra == 'all'
Provides-Extra: corpus
Requires-Dist: lxml>=4.9; extra == 'corpus'
Provides-Extra: dev
Requires-Dist: lxml>=4.9; extra == 'dev'
Requires-Dist: networkx>=3.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Provides-Extra: epidoc
Requires-Dist: lxml>=4.9; extra == 'epidoc'
Provides-Extra: prosopography
Requires-Dist: networkx>=3.0; extra == 'prosopography'
Description-Content-Type: text/markdown

<div align="center">

# 𐌏𐌐𐌄𐌍 𐌄𐌕𐌓𐌖𐌔𐌂𐌀𐌍

# OpenEtruscan

**Open-source tools for ancient epigraphy — built for Etruscan, designed to be copied.**

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)
[![Data: CC0](https://img.shields.io/badge/data-CC0-green.svg)](https://creativecommons.org/publicdomain/zero/1.0/)

*Normalize · Search · Export · Contribute*

</div>

---

## What Is This?

OpenEtruscan is a Python toolkit that solves the **transcription chaos** in Etruscan studies. The same word appears as 5+ incompatible forms across publications — making cross-corpus search impossible.

We fix that. One `pip install`, zero servers, works offline forever.

```python
from openetruscan import normalize

# Input in ANY transcription system
result = normalize("LARTHAL")       # CIE standard
result = normalize("Larθal")        # Philological
result = normalize("𐌓𐌀𐌓𐌈𐌀𐌋")  # Unicode Old Italic

# Always get the same canonical output
print(result.canonical)   # → "larθal"
print(result.phonetic)    # → "/lar.tʰal/"
print(result.old_italic)  # → "𐌓𐌀𐌓𐌈𐌀𐌋"
```

## Quick Start

```bash
pip install openetruscan
```

### Normalize a text

```bash
openetruscan normalize "LARTHAL LECNES"
```

### Batch process a file

```bash
openetruscan batch corpus.txt --format csv --output clean.csv
```

### Validate encoding

```bash
openetruscan validate my_transcription.txt
```

## Why?

| Problem | Today | With OpenEtruscan |
|---|---|---|
| "Where else does this word appear?" | Flip through 300 pages of print volumes | `corpus.search(text="*al lecn*")` |
| "Is this spelling a dialect variant?" | An entire journal article to pose the question | One query, 30 seconds |
| "I need to publish my thesis data" | Word doc, usable only by the author | `openetruscan batch thesis.txt --format epidoc` → PR → global corpus |
| "How widespread was this clan?" | Months of manual index-reading | `corpus.names.search(gens="lecne")` → map |

## Architecture

The core engine is **language-agnostic**. Each language is a YAML config file:

```
openetruscan/
├── engine/          # Universal normalizer, parser, exporter
├── adapters/
│   ├── etruscan.yaml    # Etruscan alphabet, phonotactics, names
│   ├── oscan.yaml       # Same engine, different YAML
│   ├── rhaetic.yaml     # ... add any ancient script
│   └── YOUR_LANG.yaml   # Fork this pattern
├── corpus/          # Structured dataset (SQLite, Git-native)
├── prosopography/   # Name parser + kinship graph
└── exporters/       # EpiDoc XML, CSV, JSON-LD, GeoJSON
```

**Want to support another language?** Write 50 lines of YAML. The engine does the rest.

## Zero Infrastructure

- No servers. No databases. No grants.
- Data ships inside the `pip` package (SQLite).
- Web tools run as static HTML (GitHub Pages).
- Updates via `pip install --upgrade`.
- **Total hosting cost: $0/month. Forever.**

## Contributing

### Add Data

Found a new inscription? Have a dissertation corpus?

1. Fork this repo
2. Add entries to `data/contributions/your_name.csv`
3. Run `openetruscan validate data/contributions/your_name.csv`
4. Open a Pull Request
5. CI validates encoding + duplicates
6. We merge → your data is in the next release

Your name stays in the Git history. Your discovery becomes searchable worldwide.

### Add a Language

1. Copy `src/openetruscan/adapters/etruscan.yaml`
2. Fill in your language's alphabet, variants, phonotactics
3. Run `pytest` to verify
4. Open a Pull Request

### Improve the Code

```bash
git clone https://github.com/open-etruscan/openetruscan.git
cd openetruscan
pip install -e ".[dev]"
pytest
```

See [CONTRIBUTING.md](CONTRIBUTING.md) for details.

## Packages

| Package | Description | Status |
|---|---|---|
| `openetruscan` (core) | Normalizer + CLI + adapters | ✅ Released |
| `openetruscan[corpus]` | Structured dataset + query API | ✅ Released |
| `openetruscan[prosopography]` | Name parser + kinship graph | ✅ Released |
| `openetruscan[all]` | Everything | ✅ Released |

## Roadmap

### ✅ Done

- [x] **Normalizer engine** — auto-detect 5 transcription systems, fold to canonical, phonotactic validation
- [x] **CLI** — `normalize`, `batch`, `convert`, `validate`, `adapters` commands
- [x] **Etruscan adapter** — 23 letters, 35+ known names, equivalence classes
- [x] **Corpus database** — SQLite-backed, 4,700+ inscriptions from Larth dataset
- [x] **Prosopography** — name parser, 633 clans, kinship graph, GraphML/JSON export
- [x] **Web converter** — static HTML/CSS/JS, runs in any browser, zero backend
- [x] **GitHub Actions** — CI (Python 3.10-3.13 + Ruff), Pages deploy, PyPI publish
- [x] **64 tests** passing across all modules

### 🔜 Next (v0.2)

- [ ] **Faliscan + Oscan adapters** — prove the multi-language architecture (one YAML each)
- [ ] **Web language selector** — switch between languages in the web converter
- [ ] **GeoJSON map viewer** — static HTML page showing inscription findspots on an interactive map
- [ ] **EpiDoc XML exporter** — interoperability with the digital classics ecosystem
- [ ] **PyPI release** — first public `pip install openetruscan`
- [ ] **Corpus CLI** — `openetruscan search`, `openetruscan import`, `openetruscan export` commands

### 🗓️ Planned (v0.3)

- [ ] **CLTK Etruscan module** — contribute to the [Classical Language Toolkit](https://cltk.org)
- [ ] **Linked Open Data** — publish to [Pelagios](https://pelagios.org)/[Pleiades](https://pleiades.stoa.org) gazetteers
- [ ] **Statistical tools** — letter frequency analysis, dialect clustering, dating heuristics
- [ ] **Web search interface** — search the corpus from the browser (static, no backend)
- [ ] **Rhaetic + Lemnian adapters** — expand to the Tyrsenian language family

### 🌍 Vision

- [ ] **Community data contributions** — scholars submit inscriptions via PR, CI validates
- [ ] **Cross-language scholar search** — query across Etruscan, Oscan, Faliscan from one interface
- [ ] **Academic citation** — dataset cited in peer-reviewed publications
- [ ] **Template repo** — one-click fork to set up tools for any underdocumented ancient script

## License

- **Code:** [MIT](LICENSE) — do whatever you want
- **Data:** [CC0](https://creativecommons.org/publicdomain/zero/1.0/) — public domain, no restrictions

## Acknowledgments

OpenEtruscan builds on decades of work by epigraphers and Etruscologists. We are especially grateful to:

- The compilers of the [Corpus Inscriptionum Etruscarum](https://www.studietruschi.org)
- The [Etruscan Texts Project](https://etp.classics.umass.edu) (UMass Amherst)
- The [Larth Dataset](https://github.com/gianlucavico/Larth) (Vico & Spanakis, 2023)
- The [EpiDoc](https://epidoc.stoa.org) community
- The [Classical Language Toolkit](https://cltk.org)

---

<div align="center">

*Built for Etruscan. Designed to be copied.*

𐌀 𐌁 𐌂 𐌃 𐌄 𐌅 𐌆 𐌇 𐌈 𐌉 𐌊 𐌋 𐌌 𐌍 𐌎 𐌏 𐌐 𐌑 𐌓 𐌔 𐌕 𐌖 𐌗 𐌘 𐌙 𐌚

</div>
