Metadata-Version: 2.4
Name: chronowords
Version: 0.2.1
Summary: Detect semantic shifts in word embeddings over time
Author-email: Orsolya Putz <orsolya.putz@crowintelligence.org>, Zoltan Varju <zoltan.varju@crowintelligence.org>
License-Expression: MIT
Project-URL: Homepage, https://github.com/crow-intelligence/chronowords
Project-URL: Repository, https://github.com/crow-intelligence/chronowords
Project-URL: Documentation, https://chronowords.readthedocs.io/en/latest/
Project-URL: Organization, https://crowintelligence.org/
Keywords: nlp,embeddings,semantic-change,topic-modeling
Requires-Python: <3.13,>=3.10
Description-Content-Type: text/markdown
Requires-Dist: numpy<3,>=1.26.0
Requires-Dist: scipy<2,>=1.12.0
Requires-Dist: cython<4,>=3.0.11
Requires-Dist: setuptools>=75.8.0
Requires-Dist: mmh3<6,>=5.0.1
Requires-Dist: nltk<4,>=3.9.1
Requires-Dist: scikit-learn<2,>=1.6.1

<p align="center">
  <img src="https://raw.githubusercontent.com/crow-intelligence/chronowords/main/img/chronowords.svg" alt="chronowords" width="450"/>
</p>

<p align="center">
  <a href="https://pypi.org/project/chronowords/"><img src="https://img.shields.io/pypi/v/chronowords.svg" alt="PyPI"></a>
  <a href="https://chronowords.readthedocs.io/en/latest/"><img src="https://img.shields.io/readthedocs/chronowords" alt="Docs"></a>
</p>

# chronowords

Detect semantic shifts over time in word embeddings. Train small PPMI-based language models, create topic models using NMF, and analyze semantic changes using Procrustes alignment.

## Features

- Memory-efficient word embedding training using Count-Min Sketch
- Topic modeling with Non-negative Matrix Factorization
- Temporal alignment of word embeddings using Procrustes analysis
- Cython-optimized PPMI matrix computation

## Installation

```bash
pip install chronowords
```

## Quick Start
```python
from chronowords.algebra import SVDAlgebra
from chronowords.topics import TopicModel

# Train word embeddings
model = SVDAlgebra(n_components=300)
model.train(your_corpus_iterator)

# Find similar words
similar = model.most_similar('computer')
for word in similar:
    print(f"{word.word}: {word.similarity:.3f}")

# Create topic model
topic_model = TopicModel(n_topics=10)
topic_model.fit(ppmi_matrix, vocabulary)
```

## Links
- Documentation: <https://chronowords.readthedocs.io/en/latest/>
- PyPI: <https://pypi.org/project/chronowords/>

## Requirements

Python ≥ 3.10
NumPy
SciPy
scikit-learn
Cython

## Contributing
Pull requests welcome. For major changes, open an issue first.

## License
MIT

## Made by
Built and maintained by [Crow Intelligence](https://crowintelligence.org/).
