Metadata-Version: 2.4
Name: klca
Version: 0.1.3
Summary: Korean lexical complexity analyzer.
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: huggingface_hub
Requires-Dist: stanza

# klca

`klca` is a Korean lexical complexity analyzer.

## Usage

Show help:

```bash
python3 -m klca --help
```

Analyze one file:

```bash
python3 -m klca file --input-file path/to/text.txt --output output.json
```

Analyze a folder:

```bash
python3 -m klca folder --input-dir path/to/texts --output results.csv
```

- Use `--recursive` to include text files in subfolders. Without it, only files directly inside `--input-dir` are processed.

## Included Resources
This package includes bundled resources used by the analyzer, including:

- Reference databases for frequency, range, and association, released as an open-source dataset (korean-fineweb-edu)
- Vocabulary grade database (National Institute of Korean Language resources), released under Korea Open Government License Type 1

## Morpheme Parsing and Tagging
- By default, `klca` uses the Korean `stanza` GSD model for tokenization, POS tagging, and lemmatization.
- The model is downloaded by `stanza` at runtime and is not bundled in this package.
- If you want to use a different Korean `stanza` model or a custom local model, you can modify the Stanza pipeline settings in the setting (both core.py and batch.py).

## License
- This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
