Metadata-Version: 2.2
Name: smoothtext
Version: 0.0.17
Summary: SmoothText is a Python library for calculating readability scores of texts and statistical information for texts in multiple languages.
Author-email: Tuğrul Güngör <contact@tugrulgungor.me>
Project-URL: Documentation, https://smoothtext.github.io
Project-URL: Homepage, https://github.com/smoothtext/smoothtext
Project-URL: Issues, https://github.com/smoothtext/smoothtext/issues
Project-URL: Source, https://github.com/smoothtext/smoothtext
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Natural Language :: Turkish
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Text Processing
Requires-Python: <3.13,>=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: nltk
Requires-Dist: nltk>=3.9.1; extra == "nltk"
Provides-Extra: stanza
Requires-Dist: stanza>=1.10.1; extra == "stanza"

# SmoothText

---

[![license](https://img.shields.io/github/license/smoothtext/smoothtext.svg)](https://github.com/smoothtext/smoothtext/blob/main/LICENSE)
[![versions](https://img.shields.io/pypi/pyversions/smoothtext.svg)](https://github.com/smoothtext/smoothtext)
[![pypi](https://img.shields.io/pypi/v/smoothtext.svg)](https://pypi.org/project/smoothtext/)
[![downloads](https://static.pepy.tech/personalized-badge/smoothtext?period=total&units=international_system&left_color=grey&right_color=orange&left_text=pip%20downloads)](https://pypi.org/project/smoothtext/)

---

> SmoothText is still in alpha and there may be breaking changes.

---

## Introduction

SmoothText is a Python library for calculating readability scores of texts and statistical information for texts in
multiple languages.

The design principle of this library is to ensure high accuracy.

## Requirements

Python 3.10 or higher.

### External Dependencies

|                     Library                      |  Version   |           License            | Notes                   |
|:------------------------------------------------:|:----------:|:----------------------------:|-------------------------|
|          [NLTK](https://www.nltk.org/)           | `>=3.9.1`  |         `Apache 2.0`         | Conditionally optional. |
| [Stanza](https://stanfordnlp.github.io/stanza/)  | `>=1.10.1` |         `Apache 2.0`         | Conditionally optional. |
| [Unidecode](https://pypi.org/project/Unidecode/) | `>=1.3.8`  |         `GNU GPLv2`          | Required.               |
|    [Pyphen](https://github.com/Kozea/Pyphen)     | `>=0.17.0` | `GPL 2.0+/LGPL 2.1+/MPL 1.1` | Required.               |

Either NLTK or Stanza must be installed and used with the SmoothText library.

## Features

### Readability Analysis

SmoothText can calculate readability scores of text in the following languages, using the following formulas.

| Formula/Language                                                                                                                                                                                                                             | English |                                                                                                                                Turkish                                                                                                                                |
|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
| [Flesch Reading Ease](https://scholar.google.com/scholar?as_sdt=0%2C5&q=A+New+Readability+Yardstick+R+Flesch&btnG=)                                                                                                                          |    ✔    |                                                          ✔ [Ateşman](https://scholar.google.com/scholar?as_sdt=0%2C5&q=T%C3%BCrk%C3%A7ede+Okunabilirli%C4%9Fin+%C3%96l%C3%A7%C3%BClmesi+Ate%C5%9Fman&btnG=)                                                           |
| [Flesch-Kincaid Grade](https://scholar.google.com/scholar?as_sdt=0%2C5&q=Derivation+of+new+readability+formulas+%28automated+readability+index%2C+fog+count+and+flesch+reading+ease+formula%29+for+navy+enlisted+personnel&btnG=)            |    ✔    | ✔ [Bezirci-Yılmaz](https://scholar.google.com/scholar?as_sdt=0%2C5&q=Metinlerin+okunabilirli%C4%9Finin+%C3%B6l%C3%A7%C3%BClmesi+%C3%BCzerine+bir+yazilim+k%C3%BCt%C3%BCphanesi+ve+T%C3%BCrk%C3%A7e+i%C3%A7in+yeni+bir+okunabilirlik+%C3%B6l%C3%A7%C3%BCt%C3%BC&btnG=) |
| [Flesch-Kincaid Grade Simplified](https://scholar.google.com/scholar?as_sdt=0%2C5&q=Derivation+of+new+readability+formulas+%28automated+readability+index%2C+fog+count+and+flesch+reading+ease+formula%29+for+navy+enlisted+personnel&btnG=) |    ✔    |                                                                                                                                   ❌                                                                                                                                   |

Notes:

- **Ateşman** is the Turkish adaptation of **Flesch Reading Ease**. The two can be used interchangeably in the module.
- **Bezirci-Yılmaz** is the Turkish adaptation of **Flesch-Kincaid Grade**. The two can be used interchangeably in the
  module.
- **Flesch-Kincaid Grade Simplified** is essentially the same formula with as **Flesch-Kincaid Grade**, except that its
  constants are different.

### Sentencizing, Tokenizing, and Syllabifying

SmoothText can extract sentences, words, or syllables from texts.

### Reading Time

SmoothText can calculate how long would a text take to read.

## Installation

You can install SmoothText via `pip`.

```Python
pip
install
smoothtext
```

## Usage

### Importing and Initializing the Library

SmoothText comes with three submodules: `Language`, `ReadabilityFormula` and SmoothText.

```Python
from smoothtext import Language, ReadabilityFormula, SmoothText
```

Before using, the library must be initialized with a static function. The following will set
[NLTK](https://www.nltk.org/) as the backend, and automatically download all the resources for the supported languages.
Alternatively, you can use [Stanza](https://stanfordnlp.github.io/stanza/).

```Python
SmoothText.setup(backend='nltk')
```

### Instancing

SmoothText is expected to be used with SmoothText class instances.

```Python
st = SmoothText('en')
```

Now, an instance is accessible via `st`, and it is ready to work with English texts.

### Calculating Readability Scores

See the following [text](https://en.wikipedia.org/wiki/Forrest_Gump). Now, we will analyze it.

```Python
text = "Forrest Gump is a 1994 American comedy-drama film directed by Robert Zemeckis."
```

For English, we have two available formulas: `Flesch Reading Ease` and `Flesch-Kincaid Grade`. We can either call the
`compute_readability` function, or use the instance as a callable. Either way, we are expected to pass the formula.

```python
score_1 = st.compute_readability(text, ReadabilityFormula.Flesch_Reading_Ease)
score_2 = st(text, ReadabilityFormula.Flesch_Kincaid_Grade)

print(score_1, score_2)
# Output is: 25.455000000000013 12.690000000000001
```

## Documentation

See [here](https://smoothtext.github.io/) for API documentation.

## Roadmap

SmoothText is still in its early stages. The immediate tasks include adding more languages and backends.

## License

SmoothText has an MIT license. See [LICENSE](./LICENSE).
