pyaegean
Copyright 2026 Ryan Pavlicek

This product includes software developed by Ryan Pavlicek, licensed under the
Apache License, Version 2.0.

================================================================================
Third-party data and attributions
================================================================================

Linear A corpus (bundled text JSON: src/aegean/data/bundled/lineara/)
  Derived from the GORILA edition — Godart, L. & Olivier, J.-P. (1976–1985),
  "Recueil des inscriptions en linéaire A" (Études Crétoises) — via the
  open dataset at https://github.com/mwenge/lineara.xyz.
  Used as a scholarly reference corpus. Transliterations and structured
  metadata only; no facsimile imagery is bundled or redistributed.

Linear A facsimile / photographic imagery (fetched on demand, NOT redistributed)
  © École Française d'Athènes and respective rights holders. Downloaded by the
  user on demand from the linearaworkbench repository for academic reference;
  never re-hosted by this package.

Ancient Greek Dependency Treebank — AGDT v2.1 (fetched on demand, NOT redistributed)
  Perseus Ancient Greek Dependency Treebank, v2.1
  (https://github.com/PerseusDL/treebank_data), licensed CC BY-SA 3.0. Downloaded
  by the user on demand via aegean.greek.use_treebank() / use_parser() / use_tagger() /
  use_lemmatizer(); pyaegean builds derived artifacts (a lemma/morphology lexicon, a
  dependency-parser model, a POS-tagger model, and a lemmatizer model) in the local cache
  and neither bundles nor re-hosts the treebank itself. Attribute the AGDT in academic
  work that relies on it.

Liddell-Scott-Jones lexicon — Perseus LSJ (fetched on demand, NOT redistributed)
  A Greek-English Lexicon (Liddell, Scott, Jones), digitized by the Perseus Digital
  Library (https://github.com/PerseusDL/lexica), licensed CC BY-SA 4.0. Text provided
  under a CC BY-SA license by Perseus Digital Library, http://www.perseus.tufts.edu,
  with funding from The National Endowment for the Humanities. Data accessed from
  https://github.com/PerseusDL/lexica/. Downloaded by the user on demand via
  aegean.greek.use_lsj(); pyaegean builds a derived index in the local cache and
  neither bundles nor re-hosts the lexicon.

Neural Greek lemmatizer model (fetched on demand, NOT bundled)
  An ONNX seq2seq lemmatizer fine-tuned from bowphs/GreTa (Riemenschneider & Frank,
  "Exploring Large Language Models for Ancient Greek and Latin", Apache-2.0;
  arXiv:2410.12055) on the AGDT (CC BY-SA 3.0), the Pedalion treebanks (CC BY-SA 4.0),
  and the Gorman Ancient Greek treebanks (CC0). The resulting model — ONNX weights plus a
  derived gold lemma lookup — is licensed CC BY-SA 4.0. Downloaded by the user on demand
  via aegean.greek.use_neural_lemmatizer() (the [neural] extra); fetched to the local cache,
  never bundled in the Apache-2.0 wheel. Attribute GreTa and the treebanks in academic work.

Planned (future versions; included here for forward attribution):
  - DAMOS — Database of Mycenaean at Oslo (Linear B).
  - LiBER — Linear B Electronic Resources.
  - PROIEL treebanks (Ancient Greek) — CC BY-NC-SA.
  - First1KGreek / Open Greek and Latin — CC BY.
  - Morpheus morphological data; LSJ lexicon (per applicable license).
  Each will be attributed and license-gated as it is integrated.

The structured-data layer of this package is licensed Apache-2.0; underlying
scholarly editions and imagery remain under their respective rights. Cite the
original editions in academic work (see Corpus.provenance.cite()).
