Metadata-Version: 2.4
Name: gismo
Version: 0.5.4
Summary: GISMO is a NLP tool to rank and organize a corpus of documents according to a query.
License-Expression: MIT
License-File: AUTHORS.md
Keywords: NLP,Random Walk,D-Iteration,TF-I[D]TF
Author: Fabien Mathieu
Author-email: fabien.mathieu@normalesup.org
Maintainer: Fabien Mathieu
Maintainer-email: fabien.mathieu@normalesup.org
Requires-Python: >=3.10,<3.15
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Build Tools
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Provides-Extra: spacy
Requires-Dist: dill (>=0.3.9)
Requires-Dist: en-core-web-sm ; (python_version < "3.14") and (extra == "spacy")
Requires-Dist: lxml (>=6.0.2)
Requires-Dist: numba (>=0.63.0)
Requires-Dist: requests (>=2.32.3)
Requires-Dist: scikit-learn (>=1.6.1)
Requires-Dist: scipy (>=1.15.2)
Requires-Dist: spacy (>=3.8.11,<4.0) ; (python_version < "3.14") and (extra == "spacy")
Requires-Dist: zstandard (>=0.23.0)
Project-URL: Documentation, https://balouf.github.io/gismo/
Project-URL: Repository, https://github.com/balouf/gismo/
Description-Content-Type: text/markdown

[![Gismo logo](https://github.com/balouf/gismo/raw/master/docs/logo-line.png)](https://balouf.github.io/gismo/)

# A Generic Information Search... With a Mind of its Own!


[![Pypi badge](https://img.shields.io/pypi/v/gismo.svg)](https://pypi.python.org/pypi/gismo)
[![Build badge](https://github.com/balouf/gismo/actions/workflows/build.yml/badge.svg?branch=master)](https://github.com/balouf/gismo/actions?query=workflow%3Abuild)
[![Documentation badge](https://github.com/balouf/gismo/actions/workflows/docs.yml/badge.svg?branch=master)](https://github.com/balouf/gismo/actions?query=workflow%3Adocs)
[![codecov](https://codecov.io/gh/balouf/gismo/graph/badge.svg?token=TTMRW7XYS5)](https://codecov.io/gh/balouf/gismo)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

GISMO is a NLP tool to rank and organize a corpus of documents according to a query.

Gismo stands for Generic Information Search... with a Mind of its Own.

- Free software: MIT License
- Github: <https://github.com/balouf/gismo/>
- Documentation: <https://balouf.github.io/gismo/>

## Features

Gismo combines three main ideas:

- **TF-IDTF**: a symmetric version of the TF-IDF embedding.
- [**DIteration**](https://arxiv.org/abs/1202.1163): a fast, push-based, variant of the PageRank algorithm.
- **Fuzzy dendrogram**: a variant of the Louvain clustering algorithm.

## Quickstart

Install gismo:

```console
$ pip install gismo
```

Use gismo in a Python project:

```pycon
>>> from gismo.common import toy_source_dict
>>> from gismo import Corpus, Embedding, CountVectorizer, Gismo
>>> corpus = Corpus(toy_source_dict, to_text=lambda x: x['content'])
>>> embedding = Embedding(vectorizer=CountVectorizer(dtype=float))
>>> embedding.fit_transform(corpus)
>>> gismo = Gismo(corpus, embedding)
>>> gismo.rank("Mogwaï")
>>> gismo.get_features_by_rank()
['mogwaï', 'gizmo', 'chinese', 'in', 'demon', 'folklore', 'is']
```

To get the hang of a typical Gismo workflow, you can check the [Toy Example] notebook. For more advanced uses,
look at the other [tutorials] or directly the [reference] section.

## Credits

Thomas Bonald, Anne Bouillard, Marc-Olivier Buob, Dohy Hong for their helpful contribution.

This package was created with [Cookiecutter] and the [francois-durand/package_helper] project template.

# Coverage

[![codecov](https://codecov.io/gh/balouf/gismo/graphs/tree.svg?token=TTMRW7XYS5)](https://codecov.io/gh/balouf/gismo)

[cookiecutter]: https://github.com/audreyr/cookiecutter
[francois-durand/package_helper]: https://github.com/francois-durand/package_helper
[reference]: https://balouf.github.io/gismo/reference.html
[toy example]: https://balouf.github.io/gismo/tutorials/tutorial_toy_example.html
[tutorials]: https://balouf.github.io/gismo/tutorials/index.html

