pyaegean
Copyright 2026 Ryan Pavlicek

This product includes software developed by Ryan Pavlicek, licensed under the
Apache License, Version 2.0.

================================================================================
Third-party data and attributions
================================================================================

Linear A corpus (bundled text JSON: src/aegean/data/bundled/lineara/)
  Derived from the GORILA edition — Godart, L. & Olivier, J.-P. (1976–1985),
  "Recueil des inscriptions en linéaire A" (Études Crétoises) — via the
  open dataset at https://github.com/mwenge/lineara.xyz.
  Used as a scholarly reference corpus. Transliterations and structured
  metadata only; no facsimile imagery is bundled or redistributed.

Linear A facsimile / photographic imagery (fetched on demand, NOT redistributed)
  © École Française d'Athènes and respective rights holders. Downloaded by the
  user on demand from the linearaworkbench repository for academic reference;
  never re-hosted by this package.

Ancient Greek Dependency Treebank — AGDT v2.1 (fetched on demand, NOT redistributed)
  Perseus Ancient Greek Dependency Treebank, v2.1
  (https://github.com/PerseusDL/treebank_data), licensed CC BY-SA 3.0. Downloaded
  by the user on demand via aegean.greek.use_treebank() / use_parser() / use_tagger() /
  use_lemmatizer(); pyaegean builds derived artifacts (a lemma/morphology lexicon, a
  dependency-parser model, a POS-tagger model, and a lemmatizer model) in the local cache
  and neither bundles nor re-hosts the treebank itself. Attribute the AGDT in academic
  work that relies on it.

Liddell-Scott-Jones lexicon — Perseus LSJ (fetched on demand, NOT redistributed)
  A Greek-English Lexicon (Liddell, Scott, Jones), digitized by the Perseus Digital
  Library (https://github.com/PerseusDL/lexica), licensed CC BY-SA 4.0. Text provided
  under a CC BY-SA license by Perseus Digital Library, http://www.perseus.tufts.edu,
  with funding from The National Endowment for the Humanities. Data accessed from
  https://github.com/PerseusDL/lexica/. Downloaded by the user on demand via
  aegean.greek.use_lsj(); pyaegean builds a derived index in the local cache and
  neither bundles nor re-hosts the lexicon.

Neural Greek lemmatizer model (fetched on demand, NOT bundled)
  An ONNX seq2seq lemmatizer fine-tuned from bowphs/GreTa (Riemenschneider & Frank,
  "Exploring Large Language Models for Ancient Greek and Latin", Apache-2.0;
  arXiv:2410.12055) on the AGDT (CC BY-SA 3.0), the Pedalion treebanks (CC BY-SA 4.0),
  and the Gorman Ancient Greek treebanks (CC0). The resulting model — ONNX weights plus a
  derived gold lemma lookup — is licensed CC BY-SA 4.0. Downloaded by the user on demand
  via aegean.greek.use_neural_lemmatizer() (the [neural] extra); fetched to the local cache,
  never bundled in the Apache-2.0 wheel. Attribute GreTa and the treebanks in academic work.

Aegean syllabic sign data — Unicode Character Database (bundled)
  The Linear B (src/aegean/data/bundled/linearb/) and Cypriot (.../cypriot/) sign inventories
  and phonetic maps are derived from the Unicode Character Database (UnicodeData.txt — the
  "Linear B Syllabary", "Linear B Ideograms", and "Cypriot Syllabary" blocks), Copyright
  © 1991–present Unicode, Inc., distributed under the Unicode License v3
  (https://www.unicode.org/license.txt); the only obligation is to retain this notice. The
  bundled illustrative samples of transliterations are scholarly fact (after Ventris & Chadwick,
  Documents in Mycenaean Greek; and Masson, Les inscriptions chypriotes syllabiques), included as
  excerpts, not corpora.

Linear B corpora — DAMOS, LiBER (NOT bundled, NOT fetched by default)
  No openly-licensed Linear B tablet corpus exists. DAMOS (Database of Mycenaean at Oslo) is
  CC BY-NC-SA 4.0; LiBER (CNR) is all-rights-reserved. pyaegean bundles neither and fetches
  neither by default; set PYAEGEAN_LINEARB_CORPUS_URL to load your own licensed copy, which
  pyaegean parses locally and never re-hosts.

Planned (future versions; included here for forward attribution):
  - PROIEL treebanks (Ancient Greek) — CC BY-NC-SA.
  - First1KGreek / Open Greek and Latin — CC BY.
  - Morpheus morphological data; LSJ lexicon (per applicable license).
  Each will be attributed and license-gated as it is integrated.

The structured-data layer of this package is licensed Apache-2.0; underlying
scholarly editions and imagery remain under their respective rights. Cite the
original editions in academic work (see Corpus.provenance.cite()).
