Metadata-Version: 2.4
Name: cltoolkit
Version: 0.3.0
Summary: A Python Library for the Processing of Cross-Linguistic Data
Home-page: https://github.com/cldf/cltoolkit
Author: Johann-Mattis List, Robert Forkel and Frederic Blum
Author-email: robert_forkel@eva.mpg.de
License: MIT
Project-URL: Bug Tracker, https://github.com/cldf/cltoolkit/issues
Keywords: linguistics
Platform: any
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pycldf<2
Requires-Dist: lingpy>=2.6.5
Requires-Dist: pyclts<4,>=3.1
Provides-Extra: dev
Requires-Dist: tox; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Requires-Dist: wheel>=0.36; extra == "dev"
Requires-Dist: twine; extra == "dev"
Provides-Extra: test
Requires-Dist: pytest>=6; extra == "test"
Requires-Dist: pytest-mock; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Requires-Dist: coverage; extra == "test"
Requires-Dist: pyconcepticon<4; extra == "test"
Provides-Extra: doc
Requires-Dist: sphinx; extra == "doc"
Requires-Dist: sphinx-autodoc-typehints; extra == "doc"
Requires-Dist: sphinx_rtd_theme; extra == "doc"
Dynamic: license-file

# CL ToolKit

[![Build Status](https://github.com/cldf/cltoolkit/workflows/tests/badge.svg)](https://github.com/cldf/cltoolkit/actions?query=workflow%3Atests)
[![Documentation Status](https://readthedocs.org/projects/cltoolkit/badge/?version=latest)](https://cltoolkit.readthedocs.io/en/latest/?badge=latest)
[![PyPI](https://img.shields.io/pypi/v/cltoolkit.svg)](https://pypi.org/project/cltoolkit)

A Python Library for the Processing of Cross-Linguistic Data.

By Johann-Mattis List and Robert Forkel.

## Overview

While [pycldf](https://github.com/cldf/pycldf) provides a basic Python API to access cross-linguistic data 
encoded in [CLDF](https://cldf.clld.org) datasets,
`cltoolkit` goes one step further, turning the data into full-fledged Python objects rather than
shallow proxies for rows in a CSV file. Of course, as with `pycldf`'s ORM package, there's a trade-off
involved, gaining convenient access and a more pythonic API at the expense of performance (in particular 
memory footprint but also data load time) and write-access. But most of today's CLDF datasets (or aggregations 
of these) will be processable with `cltoolkit` on reasonable hardware in minutes - rather than hours.

The main idea behind `cltoolkit` is making (aggregated) CLDF data easily amenable for computation
of *linguistic features* in a general sense (e.g. typological features, etc.). This is done by
- providing the data for processing code [as Python objects](https://cltoolkit.readthedocs.io/en/latest/models.html),
- providing [a framework](https://cltoolkit.readthedocs.io/en/latest/features.html) that makes feature computation 
  as simple as writing a Python function acting on a `cltoolkit.models.Language` object.

In general, aggregated CLDF Wordlists provide limited (automated) comparability across datasets (e.g. one could
compare the number of words per language in each dataset). A lot more can be done when datasets use CLDF reference
properties to link to reference catalogs, i.e.
- [link language varieties](https://cldf.clld.org/v1.0/terms.rdf#glottocode) to [Glottolog](https://glottolog.org) languoids,
- [link senses](https://cldf.clld.org/v1.0/terms.rdf#concepticonReference) to [Concepticon concept sets](https://concepticon.clld.org/parameters),
- [link sound segments](https://cldf.clld.org/v1.0/terms.rdf#cltsReference) to [CLTS sounds](https://clts.clld.org/parameters).

`cltoolkit` objects exploit this extended comparability by distinguishing "senses" and "concepts" and "graphemes"
and "sounds" and providing convenient access to comparable subsets of objects in an aggregation 
(see [models.py](src/cltoolkit/models.py)).

See [example.md](example.md) for a walk-through of the typical workflow with `cltoolkit`.
