Metadata-Version: 2.3
Name: kenon
Version: 0.1.1
Summary: Semantic and co-occurrence graphs for midsized texts
Keywords: nlp,semantic-graphs,co-occurrence,text-analysis,computational-humanities
Author: Zoltan Varju, Orsolya Putz
Author-email: Zoltan Varju <zoltan.varju@crowintelligence.org>, Orsolya Putz <orsolya.putz@crowintelligence.org>
License: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Linguistic
Requires-Dist: spacy>=3.7
Requires-Dist: networkx>=3.3
Requires-Dist: scikit-learn>=1.4
Requires-Dist: nltk>=3.8
Requires-Dist: numpy>=1.26
Requires-Dist: scipy>=1.13
Requires-Dist: pandas>=2.2
Requires-Dist: chronowords>=0.1
Requires-Dist: pytest>=8 ; extra == 'dev'
Requires-Dist: pytest-cov ; extra == 'dev'
Requires-Dist: hypothesis>=6.100 ; extra == 'dev'
Requires-Dist: ruff>=0.4 ; extra == 'dev'
Requires-Dist: ty ; extra == 'dev'
Requires-Dist: datasets>=2.18 ; extra == 'dev'
Requires-Dist: huggingface-hub>=0.22 ; extra == 'dev'
Requires-Dist: matplotlib>=3.8 ; extra == 'dev'
Requires-Dist: seaborn>=0.13 ; extra == 'dev'
Requires-Dist: jupyter>=1.0 ; extra == 'dev'
Requires-Dist: ipykernel>=6.29 ; extra == 'dev'
Requires-Dist: tqdm>=4.66 ; extra == 'dev'
Requires-Dist: mkdocs-material ; extra == 'docs'
Requires-Dist: mkdocstrings[python] ; extra == 'docs'
Requires-Python: >=3.11
Project-URL: Homepage, https://crowintelligence.org/
Project-URL: Repository, https://github.com/crow-intelligence/kenon
Project-URL: Documentation, https://kenon.readthedocs.io
Provides-Extra: dev
Provides-Extra: docs
Description-Content-Type: text/markdown

<p align="center">
  <img src="https://raw.githubusercontent.com/crow-intelligence/kenon/main/img/kenon.svg" alt="kenon logo" width="400">
</p>

# kenon

Semantic and co-occurrence graphs for midsized texts. Kenon builds weighted
graphs from text using corpus-internal statistics — no neural models or external
training data required. Supports co-occurrence windows, TF-IDF similarity,
PMI embeddings, and disparity filter backbone extraction.

## Installation

```bash
uv add kenon
python -m spacy download en_core_web_sm
```

## Quickstart

```python
from kenon import (
    Tokenizer,
    get_stopwords,
    build_cooccurrence_graph,
    extract_backbone,
)

# 1. Tokenize
tokenizer = Tokenizer("en_core_web_sm", lemmatize=True)
tokens = tokenizer.flat_tokens("The cat sat on the mat. The dog ran in the park.")

# 2. Build graph
stopwords = get_stopwords("english")
graph = build_cooccurrence_graph(tokens, window=2, stopwords=stopwords)

# 3. Extract backbone
backbone = extract_backbone(graph, min_alpha_ptile=0.3, min_degree=2)
print(f"Backbone: {backbone.number_of_nodes()} nodes, {backbone.number_of_edges()} edges")
```

## Features

- **Tokenization**: spaCy-backed sentence splitting, tokenization, and lemmatization
- **Stopwords**: Merged NLTK + sklearn stopword lists with custom extensions
- **Embeddings**: Count vectors, TF-IDF, and PMI (via chronowords) — all corpus-internal
- **Co-occurrence graphs**: Skip-gram window co-occurrence with collocation detection
- **Semantic graphs**: Cosine similarity graphs from any embedder
- **Backbone extraction**: Disparity filter for statistically significant edges

## Documentation

See the `docs/` directory for full API reference and examples.

## Made by

Kenon is made by [Crow Intelligence](https://crowintelligence.org/).

## License

MIT
