Metadata-Version: 2.4
Name: cuneiscribe
Version: 0.2.0
Summary: Turn any text into cuneiform clay tablets - translate, transliterate, and render ancient Mesopotamian writing
Author-email: Geoffrey Wang <geoffreywang1117@gmail.com>
License: Apache-2.0
Project-URL: Homepage, https://github.com/geoffreywang1117/cuneiscribe
Project-URL: Documentation, https://github.com/geoffreywang1117/cuneiscribe#readme
Keywords: cuneiform,akkadian,translation,ancient-languages,NLP,clay-tablet
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Education
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: torch>=2.0
Requires-Dist: transformers>=4.35
Requires-Dist: sacrebleu>=2.3
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: Pillow>=9.0
Requires-Dist: svgwrite>=1.4
Provides-Extra: serve
Requires-Dist: gradio>=4.0; extra == "serve"
Requires-Dist: fastapi; extra == "serve"
Requires-Dist: uvicorn; extra == "serve"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: ruff; extra == "dev"

# CuneiScribe

**Bridge a 4,000-year cultural gap — read and write in cuneiform.**

CuneiScribe lets you interact with humanity's oldest writing system. Translate between English and Akkadian, convert to cuneiform Unicode signs, and render clay tablet images — with built-in confidence gating that tells you when results are unreliable.

```
$ cuneiscribe cuneiform "LUGAL dan-nu LUGAL KUR aš-šur"
𒈗 𒆗𒉡 𒈗 𒆳 𒀸𒋩

$ cuneiscribe classify "The computer sends an email"
Type:       modern
Confidence: 0.70
Warnings:
  - Contains 2 modern concept(s) with no direct Akkadian equivalent
```

> All outputs are machine-generated approximations. Consult Assyriological expertise for research or public-facing use.

## Showcase

### Reading the past: Epic of Gilgamesh → English

```
Input (Akkadian):  ša naq-ba i-mu-ru i-šid ma-a-ti ša kul-la-ta i-du-u₂
Cuneiform:         𒊭 𒅘𒁀 𒄿𒈬𒊒 𒄿𒋃 𒈠𒀀𒋾 𒊭 𒆰𒆷𒋫 𒄿𒁺𒌑
English:           "What Naqba saw, the fortress of all the lands, which knows all of them."
```

<p align="center">
  <img src="assets/showcase_gilgamesh.svg" width="500" alt="Gilgamesh tablet"/>
</p>

### Writing the past: Shakespeare → Cuneiform

```
Input (English):   "To be, or not to be, that is the question"
Akkadian:          šá-a-šú u la šá-a-šú šá-a-lu
```

```
Input (English):   "The king sent a letter to his brother"
Akkadian:          LUGAL a-na ŠEŠ-šu i-sap-ra
Cuneiform:         𒈗 𒀀𒈾 𒋀𒋗 𒄿𒉺𒅁𒊏
```

<p align="center">
  <img src="assets/showcase_royal.svg" width="500" alt="Royal inscription tablet"/>
</p>

### Confidence gating in action

```
$ cuneiscribe classify "The king rules the land"
Type:       short
Confidence: 0.85
Mode:       experience          ← Safe to render

$ cuneiscribe classify "Send me an email about the algorithm"
Type:       modern
Confidence: 0.70
Mode:       educational
Warnings:
  - Contains 2 modern concept(s) with no direct Akkadian equivalent
                                ← Warning: these concepts have no Akkadian equivalent

$ cuneiscribe classify "<script>alert(1)</script>"
Type:       anomalous
Confidence: 0.90
Warnings:
  - Input rejected              ← Blocked: anomalous input
```

## Install

```bash
pip install cuneiscribe
```

## Quick Start

### CLI

```bash
# Convert transliteration to cuneiform
cuneiscribe cuneiform "LUGAL dan-nu"        # → 𒈗 𒆗𒉡

# Classify input before processing
cuneiscribe classify "The king rules"       # → short, 0.85, experience mode

# Render as clay tablet
cuneiscribe render "šar kiš-ša-ti" -o tablet.svg

# Full pipeline with confidence gating (requires model)
cuneiscribe craft "The king rules" --model models/byt5-base-akkadian --json

# Look up a sign
cuneiscribe info LUGAL                      # → 𒈗, U+12217
```

### Python API

```python
from cuneiscribe import CuneiScribe, classify

# Check input first
result = classify("I love pizza")
print(result.input_type)   # "short"
print(result.warnings)     # []

# Full pipeline with gating
tc = CuneiScribe(model_path="models/byt5-base-akkadian")
result = tc.craft("The mighty king")

print(result.akkadian)     # Transliteration
print(result.cuneiform)    # Unicode cuneiform
print(result.confidence)   # 0.0-1.0
print(result.suggestion)   # "render" / "render_with_caveat" / "fallback"
print(result.warnings)     # List of caveats

# No model needed for transliteration → cuneiform
result = tc.transliterate_and_render("LUGAL dan-nu", output_path="tablet.svg")
```

### Web Demo

```bash
pip install cuneiscribe[serve]
python -m cuneiscribe.interfaces.demo --model models/byt5-base-akkadian --share
```

## How It Works

```
User Input
    │
    ▼
Input Classifier ──→ anomalous? → REJECT
    │
    ▼
ByT5 Translation (En→Ak)
    │
    ▼
Output Validator ──→ unreliable? → FALLBACK (transliteration only)
    │
    ▼
Cuneiform Converter (14,240 mappings)
    │
    ▼
Tablet Renderer (SVG/PNG)
    │
    ▼
Result + Confidence + Warnings
```

The confidence gating pipeline ensures the system **never confidently renders wrong cuneiform**. When output quality is uncertain, it degrades gracefully to transliteration-only with a warning.

## Features

| Feature | Description |
|---------|-------------|
| Confidence Gating | Input classification + output validation before rendering |
| Cuneiform Converter | 14,240 transliteration→Unicode mappings, 95.3% coverage |
| Clay Tablet Renderer | SVG/PNG with authentic Mesopotamian styling, <10ms |
| Bidirectional NMT | English→Akkadian and Akkadian→English (ByT5-base, 49.1 BLEU) |
| CLI | `cuneiform`, `render`, `craft`, `classify`, `info` commands |
| Web Demo | Gradio interface with 4-panel display |

## Architecture

```
cuneiscribe/
├── pipeline/      ← Confidence gating (classifier + validator)
├── models/        ← ByT5 bidirectional translator
├── knowledge/     ← Sign tables + cuneiform converter
└── interfaces/    ← CLI, web demo, SVG renderer
```

Four decoupled layers. Swap the model, update sign tables, or add a dialect without breaking interfaces.

## Limitations

- English→Akkadian produces **approximate modern transliterations**, not authentic ancient text
- Trained on Neo-Assyrian/Old Babylonian data; other Akkadian dialects (e.g., Old Assyrian commercial texts) may have substantially lower quality
- Even with 14,240 sign mappings and a 17K-lemma dictionary, **domain-specific data matters more than model size**
- Currently English-only

See [ROADMAP.md](ROADMAP.md) for the engineering roadmap.

## Citation

```bibtex
@inproceedings{cuneiscribe2026,
  title={CuneiScribe: Bridging a 4,000-Year Cultural Gap with Bidirectional Akkadian NMT and Cuneiform Rendering},
  author={Wang, Geoffrey},
  booktitle={Proceedings of the 4th Workshop on Cross-Cultural Considerations in NLP (C3NLP)},
  year={2026}
}
```

## License

Apache 2.0
