Metadata-Version: 2.4
Name: klca
Version: 0.1.2
Summary: Korean lexical complexity analyzer.
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: huggingface_hub
Requires-Dist: stanza

# klca

`klca` is a Korean lexical complexity analyzer.

## Dependency Data

`klca` downloads required dependency files from the public Hugging Face dataset repository `hksung/klca_deps` at runtime.
If needed, you can override this by setting `KLCA_HF_REPO_ID`.

## Usage

Show help:

```bash
python3 -m klca --help
```

Analyze one file:

```bash
python3 -m klca file --input-file path/to/text.txt --output output.json
```

Analyze a folder:

```bash
python3 -m klca folder --input-dir path/to/texts --output results.csv
```

- Use `--recursive` to include text files in subfolders. Without it, only files directly inside `--input-dir` are processed.

## Included Resources

This package includes bundled resources used by the analyzer, including:

- Reference databases for frequency, range, and association, released as an open-source dataset (korean-fineweb-edu)
- Vocabulary grade database (National Institute of Korean Language resources), released under Korea Open Government License Type 1

The default Korean `stanza` GSD model is downloaded by `stanza` at runtime and is not bundled in this package.

## License

This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
