Metadata-Version: 2.4
Name: kotodex
Version: 0.3.0
Summary: Build and query a local Japanese dictionary SQLite database.
Author: Power Digital
License-Expression: MIT
Project-URL: Homepage, https://github.com/jgbeta/kotodex
Project-URL: Repository, https://github.com/jgbeta/kotodex
Project-URL: Issues, https://github.com/jgbeta/kotodex/issues
Project-URL: Changelog, https://github.com/jgbeta/kotodex/blob/main/CHANGELOG.md
Keywords: japanese,dictionary,sqlite,jmdict,kanji,jlpt
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database
Classifier: Topic :: Education
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE
Requires-Dist: SudachiPy>=0.6
Requires-Dist: SudachiDict-core
Provides-Extra: small
Requires-Dist: SudachiDict-small; extra == "small"
Provides-Extra: core
Requires-Dist: SudachiDict-core; extra == "core"
Provides-Extra: full
Requires-Dist: SudachiDict-full; extra == "full"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: build>=1.2; extra == "dev"
Requires-Dist: twine>=5.0; extra == "dev"
Dynamic: license-file

# Kotodex Package Guide

Kotodex builds a local SQLite database from Japanese dictionary sources and exposes a compact Python API for vocabulary, kanji, example sentence, and combined dictionary lookups.

Use it when you want Japanese dictionary data available locally without parsing XML or CSV source files at runtime.

## Contents

- [Install](#install)
- [Build A Database](#build-a-database)
- [Python Quick Start](#python-quick-start)
- [Vocabulary Lookups](#vocabulary-lookups)
- [Kanji Lookups](#kanji-lookups)
- [Example Sentences](#example-sentences)
- [Combined Queries](#combined-queries)
- [Result Objects](#result-objects)
- [JSON Export](#json-export)
- [CLI Usage](#cli-usage)
- [Lemma Helper](#lemma-helper)
- [Provenance And Licensing](#provenance-and-licensing)

## Install

Default install, using the Sudachi core dictionary:

```bash
pip install kotodex
```

Optional Sudachi dictionary sizes:

```bash
pip install "kotodex[small]"
pip install "kotodex[core]"
pip install "kotodex[full]"
```

Development install from a checkout:

```bash
pip install -e ".[dev]"
```

## Build A Database

Kotodex queries a local SQLite database. Build one before using the API.

```bash
kotodex update
kotodex rebuild --db /home/user/jisho.db
kotodex status --db /home/user/jisho.db
```

Default paths:

- Source cache: `~/.cache/kotodex/sources`
- Database: `~/.local/share/kotodex/jisho.db`

Use `--db` whenever you want to control the database location explicitly.

## Python Quick Start

```python
from kotodex import Jisho

with Jisho("/home/user/jisho.db") as j:
    print(j.imi("食べる"))
    print(j.kanji("食"))
    print(j.examples("食べる"))
    print(j.query("食べる"))
```

`Jisho` can also be managed manually:

```python
from kotodex import Jisho

j = Jisho("/home/user/jisho.db")
try:
    result = j.imi("猫")
    print(result.meaning)
finally:
    j.close()
```

## Vocabulary Lookups

Use `imi()` for meaning-focused vocabulary lookup. `lookup()` is an alias for `imi()`.

```python
with Jisho("/home/user/jisho.db") as j:
    result = j.imi("待つ")

print(result.found)
print(result.word)
print(result.reading)
print(result.meaning)
print(result.jlpt)
print(result.pos)
```

Return more ranked matches with `zenbu=True`:

```python
with Jisho("/home/user/jisho.db") as j:
    result = j.imi("食べ%る", zenbu=True)

for entry in result.entries:
    print(entry.word, entry.reading, entry.meaning)
```

Use SQL wildcards in vocabulary lookups:

```python
j.imi("猫%", zenbu=True)
j.imi("%する", zenbu=True)
```

Use romaji input:

```python
result = j.imi("taberu", romaji=True)
print(result.word)
print(result.reading)
print(result.reading_romaji)
```

Include linked example sentences on vocabulary entries:

```python
result = j.imi("食べる", examples=2)

for sentence in result.examples:
    print(sentence.japanese)
    print(sentence.english)
```

Derived verb forms are recovered when direct lookup fails:

```python
result = j.imi("待てる")

print(result.word)             # 待つ
print(result.reading)          # まつ
print(result.surface_reading)  # まてる
print(result.origin)           # 待つ
print(result.derivation)       # potential
```

Direct matches always win. Derivation metadata is only present when Kotodex had to recover a dictionary base form.

## Kanji Lookups

Look up kanji directly:

```python
result = j.kanji("猫")

print(result.literal)
print(result.meaning)
print(result.readings)
print(result.kun_readings)
print(result.on_readings)
print(result.stroke_count)
print(result.jlpt)
```

Look up every kanji in a string:

```python
result = j.kanji("日本語")

for kanji in result.results:
    print(kanji.literal, kanji.meaning)
```

Search by radicals:

```python
j.kanji(radicals=["氵", "木"])
j.kanji(radicals=["氵", "木"], radical_match="any")
```

Search by stroke count:

```python
j.kanji(strokes=9)
j.kanji(strokes=(8, 10))
j.kanji(strokes=range(8, 11))
```

Search by JLPT level:

```python
j.kanji(jlpt="N5", zenbu=True)
```

Include examples and similar kanji:

```python
result = j.kanji("食", examples=2)

print(result.first.examples)
print(result.similar)
```

Disable similar kanji if you only need the base record:

```python
j.kanji("食", include_similar=False)
```

## Example Sentences

Use `examples()` to retrieve Tatoeba-linked example sentences.

```python
result = j.examples("食べる", limit=5)

for sentence in result.sentences:
    print(sentence.japanese)
    print(sentence.english)
    print(sentence.attribution)
```

Filter by difficulty or JLPT level:

```python
j.examples("食べる", difficulty="N5")
j.examples("食べる", jlpt="N5")
```

Use romaji input:

```python
j.examples("taberu", romaji=True)
```

## Combined Queries

Use `query()` when you want vocabulary, kanji, names, example sentences, and provenance in one result.

```python
result = j.query("食べる")

print(result.found)
print(result.vocabulary)
print(result.kanji)
print(result.names)
print(result.examples)
print(result.provenance)
```

Combined queries also inherit vocabulary derivation metadata:

```python
result = j.query("待てる")

print(result.vocabulary[0].word)  # 待つ
print(result.origin)              # 待つ
print(result.derivation)          # potential
print(result.surface_reading)     # まてる
```

## Result Objects

All result objects support:

```python
result.to_dict()
result.to_json(indent=2)
```

### `ImiLookupResult`

Common fields and shortcuts:

- `query`: original query text
- `lemma`: Sudachi dictionary form when available
- `found`: `True` when at least one entry was returned
- `count`: number of returned entries
- `entries`: list of `ImiEntry`
- `first`: first entry or `None`
- `word`, `reading`, `meaning`, `meanings`, `jlpt`, `pos`: shortcuts for the first entry
- `origin`, `derivation`, `surface_reading`: populated for derived-form recovery

### `ImiEntry`

Vocabulary entry fields:

- `word`
- `reading`
- `definitions`
- `meaning`
- `jlpt`
- `pos`
- `reading_romaji`
- `common`
- `priority`
- `example`
- `examples`
- `entry_id`
- `ent_seq`
- `source`
- `extra`

### `KanjiLookupResult`

Common fields and shortcuts:

- `query`
- `found`
- `count`
- `results`
- `first`
- `literal`, `meaning`, `meanings`, `readings`
- `kun_readings`, `on_readings`
- `stroke_count`, `strokes`
- `jlpt`
- `radicals`
- `similar`

### `KanjiEntry`

Kanji entry fields:

- `literal`
- `meanings`
- `on_readings`
- `kun_readings`
- `radicals`
- `stroke_count`
- `grade`
- `jlpt`
- `freq`
- `radical_classical`
- `on_romaji`
- `kun_romaji`
- `similar`
- `examples`
- `extra`

### `ExampleLookupResult`

Example lookup fields:

- `query`
- `found`
- `count`
- `sentences`
- `difficulty`
- `lemma`
- `first`

### `ExampleSentence`

Sentence fields:

- `tatoeba_id`
- `japanese`
- `english`
- `attribution`
- `difficulty`
- `japanese_romaji`
- `source`

### `QueryResult`

Combined query fields:

- `query`
- `lemma`
- `found`
- `vocabulary`
- `kanji`
- `names`
- `examples`
- `provenance`
- `raw`
- `origin`
- `derivation`
- `surface_reading`

## JSON Export

Use JSON export for API responses, notebooks, scripts, and debugging.

```python
print(j.imi("食べる").to_json(indent=2))
print(j.kanji("食").to_json(indent=2))
print(j.examples("食べる").to_json(indent=2))
print(j.query("食べる").to_json(indent=2))
```

Disable escaped Japanese text if you pass your own JSON settings:

```python
j.query("食べる").to_json(indent=2, ensure_ascii=False)
```

`ensure_ascii=False` is already the default.

## CLI Usage

Download or refresh source files:

```bash
kotodex update
kotodex update --force
kotodex update --cache-dir /tmp/kotodex-sources
```

Build a database:

```bash
kotodex rebuild --db /home/user/jisho.db
kotodex rebuild --force --db /home/user/jisho.db
kotodex rebuild --db /home/user/jisho.db --cache-dir /tmp/kotodex-sources
```

Check database and source status:

```bash
kotodex status --db /home/user/jisho.db
```

Vocabulary lookup:

```bash
kotodex imi 食べる --db /home/user/jisho.db
kotodex imi taberu --romaji --db /home/user/jisho.db
kotodex imi '食べ%る' --zenbu --db /home/user/jisho.db
kotodex imi 食べる --examples 2 --json --db /home/user/jisho.db
```

Kanji lookup and search:

```bash
kotodex kanji 食 --db /home/user/jisho.db
kotodex kanji 日本語 --db /home/user/jisho.db
kotodex kanji --radical 氵 --radical 木 --db /home/user/jisho.db
kotodex kanji --radical 氵 --radical 木 --radical-match any --db /home/user/jisho.db
kotodex kanji --strokes 9 --db /home/user/jisho.db
kotodex kanji --strokes 8-10 --db /home/user/jisho.db
kotodex kanji --jlpt N5 --zenbu --json --db /home/user/jisho.db
```

Example sentences:

```bash
kotodex examples 食べる --limit 5 --db /home/user/jisho.db
kotodex examples 食べる --difficulty N5 --db /home/user/jisho.db
kotodex examples taberu --romaji --json --db /home/user/jisho.db
```

Combined query:

```bash
kotodex query 食べる --db /home/user/jisho.db
kotodex query 食べる --examples 10 --json --db /home/user/jisho.db
```

## Lemma Helper

Use Sudachi-based normalization directly when you only need the dictionary form.

```python
from kotodex.lemma import get_lemma

print(get_lemma("待てる"))
print(get_lemma("食べました"))
```

Choose a Sudachi dictionary size:

```python
get_lemma("食べました", dict_type="small")
get_lemma("食べました", dict_type="core")
get_lemma("食べました", dict_type="full")
```

## Provenance And Licensing

Kotodex stores source provenance in the generated database.

```python
with Jisho("/home/user/jisho.db") as j:
    print(j.notice())
    print(j.provenance())
```

Important licensing notes:

- Source-derived content comes from EDRDG and Tatoeba and has attribution obligations.
- Generated databases may be subject to CC BY-SA 4.0 due to EDRDG-derived content.
- Per-sentence Tatoeba attribution is exposed as `ExampleSentence.attribution`.
- Use `Jisho.notice()` and `Jisho.provenance()` to inspect the generated database metadata.

## Typical Workflows

Build once, query many times:

```bash
kotodex update
kotodex rebuild --db ./jisho.db
```

```python
from kotodex import Jisho

with Jisho("./jisho.db") as j:
    print(j.imi("猫").meaning)
```

Create a local JSON lookup endpoint:

```python
from kotodex import Jisho

def lookup_json(text: str) -> str:
    with Jisho("./jisho.db") as j:
        return j.query(text).to_json(indent=2)
```

Export study data:

```python
from kotodex import Jisho

with Jisho("./jisho.db") as j:
    result = j.imi("食べ%る", zenbu=True)
    rows = [(entry.word, entry.reading, entry.meaning, entry.jlpt) for entry in result.entries]
```
