Metadata-Version: 2.4
Name: toynlp
Version: 0.2.0
Summary: A toy NLP library for educational purposes.
Author: Xiangzhuang Shen
Author-email: Xiangzhuang Shen <datahonor@gmail.com>
License-Expression: Apache-2.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: datasets>=3.2.0
Requires-Dist: evaluate>=0.4.5
Requires-Dist: huggingface-hub>=0.28.1
Requires-Dist: numpy>=2.2.2
Requires-Dist: pyyaml>=6.0
Requires-Dist: safetensors>=0.5.3
Requires-Dist: tokenizers>=0.21.0
Requires-Dist: torch>=2.5.1
Requires-Dist: wandb>=0.19.5
Requires-Dist: pre-commit ; extra == 'dev'
Requires-Dist: ipython ; extra == 'dev'
Requires-Dist: mypy ; extra == 'dev'
Requires-Dist: pytest ; extra == 'dev'
Requires-Dist: pytest-cov ; extra == 'dev'
Requires-Dist: pytest-sugar ; extra == 'dev'
Requires-Dist: hypothesis>=6.112.0 ; extra == 'dev'
Requires-Dist: jupyterlab>=4.3.5 ; extra == 'dev'
Requires-Dist: ipywidgets>=8.1.5 ; extra == 'dev'
Requires-Dist: ruff>=0.12.7 ; extra == 'dev'
Requires-Dist: types-pyyaml>=6.0.12.20250516 ; extra == 'dev'
Requires-Dist: mkdocs ; extra == 'docs'
Requires-Dist: mkdocs-material ; extra == 'docs'
Requires-Dist: mkdocs-material-extensions ; extra == 'docs'
Requires-Dist: mkdocstrings ; extra == 'docs'
Requires-Dist: mkdocs-bibtex ; extra == 'docs'
Requires-Dist: mkdocstrings-python ; extra == 'docs'
Requires-Dist: mkdocs-autorefs ; extra == 'docs'
Requires-Dist: mkdocs-git-committers-plugin-2 ; extra == 'docs'
Requires-Dist: mkdocs-git-revision-date-localized-plugin ; extra == 'docs'
Requires-Python: >=3.12
Project-URL: Bug Tracker, https://shenxiangzhuang.github.io/toynlp/issues
Project-URL: Documentation, https://shenxiangzhuang.github.io/toynlp
Project-URL: Homepage, https://shenxiangzhuang.github.io/toynlp
Project-URL: Release Notes, https://shenxiangzhuang.github.io/toynlp/changelog/
Project-URL: Source Code, https://github.com/shenxiangzhuang/toynlp
Provides-Extra: dev
Provides-Extra: docs
Description-Content-Type: text/markdown

<center>

[![Python](https://img.shields.io/pypi/pyversions/toynlp.svg?color=%2334D058)](https://pypi.org/project/toynlp/)
[![PyPI](https://img.shields.io/pypi/v/toynlp?color=%2334D058&label=pypi%20package)](https://pypi.org/project/toynlp/)
[![PyPI Downloads](https://static.pepy.tech/badge/toynlp)](https://pepy.tech/projects/toynlp)

[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![Checked with mypy](https://www.mypy-lang.org/static/mypy_badge.svg)](https://mypy-lang.org/)
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)

[![Build Docs](https://github.com/ai-glimpse/toynlp/actions/workflows/build_docs.yaml/badge.svg)](https://github.com/ai-glimpse/toynlp/actions/workflows/build_docs.yaml)
[![Test](https://github.com/ai-glimpse/toynlp/actions/workflows/test.yaml/badge.svg)](https://github.com/ai-glimpse/toynlp/actions/workflows/test.yaml)
[![Codecov](https://codecov.io/gh/ai-glimpse/toynlp/branch/master/graph/badge.svg)](https://codecov.io/gh/ai-glimpse/toynlp)
[![GitHub License](https://img.shields.io/github/license/ai-glimpse/toynlp)](https://github.com/ai-glimpse/toynlp/blob/master/LICENSE)

</center>

# ToyNLP

NLP models with clean implementation.


## Models

10 important NLP models range from 2003 to 2020:

- [x] NNLM(2003)
- [x] Word2Vec(2013)
- [x] Seq2Seq(2014)
- [ ] Attention(2015)
- [ ] fastText(2016)
- [ ] Transformer(2017)
- [ ] BERT(2018)
- [ ] GPT(2018)
- [ ] XLNet(2019)
- [ ] T5(2020)


## FAQ

### Where is GPT-2 and other LLMs?

 Well, it's in [toyllm](https://github.com/ai-glimpse/toyllm)!
I separated the models into two libraries, `toynlp` for traditional "small" NLP models and `toyllm` for LLMs, which are typically larger and more complex.
