Metadata-Version: 2.4
Name: ipanema
Version: 202602.14.2
Summary: Packaged language data from Wiktionary
Author-email: Jan Berkel <jan@berkel.fr>
License-Expression: MIT
Project-URL: Repository, https://gitlab.com/jberkel/ipanema
Project-URL: Documentation, https://jberkel.gitlab.io/ipanema/
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.13
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: peewee>=3.17.0
Dynamic: license-file

# /ipaˈnẽmɐ/
[![PyPI - Version][PythonBadge]][PythonLink]
[![Swift5 compatible][Swift5Badge]][Swift5Link]
[![Gitlab Pipeline Status](https://img.shields.io/gitlab/pipeline-status/jberkel%2Fipanema)](https://gitlab.com/jberkel/ipanema/-/pipelines/)

ipanema provides an API in various programming languages to access the [Wiktionary language database][]
and other language-related data.

### Python

```shell
$ python
>>> from ipanema import query_language
>>> query_language('ca')
{'code': 'ca', 'canonical_name': 'Catalan', 'family': {'code': 'roa-ocr', 'canonical_name': 'Occitano-Romance', 'wikidata_item': 'Q599958', 'parent_family': 'Gallo-Romance', 'proto_language_code': None}, 'ancestor': 'Old Catalan', 'parent': 'None', 'wikidata_item': 'Q7026'}
>>> query_language('Deutsch')
{'code': 'de', 'canonical_name': 'German', 'family': {'code': 'gmw-hgm', 'canonical_name': 'High German', 'wikidata_item': 'Q52040', 'parent_family': 'West Germanic', 'proto_language_code': 'goh'}, 'ancestor': 'Early New High German', 'parent': 'None', 'wikidata_item': 'Q188'}
>>> from ipanema import query_family
>>> query_family('Indo-European')
  {'code': 'ine', 'canonical_name': 'Indo-European', 'wikidata_item': 'Q19860', 'parent_family': 'None', 'proto_language_code': 'ine-pro'}
```

[API docs](https://jberkel.gitlab.io/ipanema/)

### Java

```java
import ipanema.language.model.Language;
import ipanema.language.model.LanguageData;

Optional<Language> ca = LanguageData.load().getLanguage("ca");
```

### Swift

```swift
import Ipanema

let ca = try! Polyglot.sharedInstance.languageData("ca")
```

### JSON

```shell
$ jq '.ca' data/lang_data.json
{
  "ancestors": "roa-oca",
  "canonicalName": "Catalan",
  "family": "roa-ocr",
  "scripts": "Latn",
  "sort_key": {
    "remove_diacritics": "̧̀́̈·"
  },
  "standard_chars": "AaÀàBbCcÇçDdEeÉéÈèFfGgHhIiÍíÏïJjLlMmNnOoÓóÒòPpQqRrSsTtUuÚúÜüVvXxYyZz· ',-‐‑‒–—…∅",
  "type": "regular",
  "wikidata_item": "Q7026"
}
```

### SQLite

```shell
$ sqlite data/languages.sqlite
sqlite> select * from languages where code = 'ca';
ca|Catalan||roa-oca|roa-ocr|regular|Q7026
```

## Language data

Data sources:
* [Module:languages/data2][], [Module:languages/data3][], [Module:families/data][]

### Extraction

The actual language data is stored in a submodule ([ipanema-data][]). To update/regenerate the data
manually:
    
```shell
$ apt-get install jq lua5.1 liblua5.1-dev luarocks # linux
$ brew install jq lua@5.1 luarocks # osx
$ make clean # delete stored data
$ make
```

### Language codes 

The Wiktionary [language code][] is defined as follows:

1. If the language has a two-letter code in the ISO 639-1 standard, then that code is used.
2. If the language has a three-letter code in the ISO 639-3 standard, then that code is used.
3. If the language has a three-letter code in the ISO 639-2 standard, then that code is used. (rare)
4. Any language which does not have an ISO code, but which is to be included in Wiktionary, has a new Wiktionary-specific "exceptional" code devised for it.

### License

The language data extracted from Wiktionary is subject to the Creative Commons license,
[CC BY-SA 4.0][]. The data has been transformed (into a machine-readable format), but not
modified. The project itself is licensed as MIT, see [LICENSE](LICENSE).

[ipanema-data]: https://gitlab.com/jberkel/ipanema-data  
[Wiktionary language database]: https://en.wiktionary.org/wiki/Wiktionary:List_of_languages
[language code]: https://en.wiktionary.org/wiki/Wiktionary:Languages#Language_codes
[Module:languages/data2]: https://en.wiktionary.org/wiki/Module:languages/data2
[Module:languages/data3]: https://en.wiktionary.org/wiki/Module:languages/data3
[Module:families/data]: https://en.wiktionary.org/wiki/Module:families/data
[CC BY-SA 4.0]: https://creativecommons.org/licenses/by-sa/4.0/
[Swift5Badge]: https://img.shields.io/badge/swift-5-orange.svg?style=flat
[Swift5Link]: https://developer.apple.com/swift/
[PythonBadge]: https://img.shields.io/pypi/v/ipanema
[PythonLink]: https://pypi.org/project/ipanema/
