Metadata-Version: 2.4
Name: pystylometry
Version: 0.1.0
Summary: Comprehensive Python package for stylometric analysis
License: MIT
Keywords: stylometry,nlp,text-analysis,authorship,readability,lexical-diversity,readability-metrics
Author: Craig Trim
Author-email: craigtrim@gmail.com
Requires-Python: >=3.11,<4.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Typing :: Typed
Requires-Dist: stylometry-ttr (>=1.0.3,<2.0.0)
Project-URL: Homepage, https://github.com/craigtrim/pystylometry
Project-URL: Issues, https://github.com/craigtrim/pystylometry/issues
Project-URL: Repository, https://github.com/craigtrim/pystylometry
Description-Content-Type: text/markdown

# pystylometry

[![Python Version](https://img.shields.io/badge/python-3.9%2B-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)
[![PyPI version](https://badge.fury.io/py/pystylometry.svg)](https://badge.fury.io/py/pystylometry)

A comprehensive Python package for stylometric analysis with modular architecture and optional dependencies.

## Features

**pystylometry** provides 50+ metrics across five analysis domains:

- **Lexical Diversity**: TTR, MTLD, Yule's K, Hapax ratios, and more
- **Readability**: Flesch, SMOG, Gunning Fog, Coleman-Liau, ARI
- **Syntactic Analysis**: POS ratios, sentence statistics (requires spaCy)
- **Authorship Attribution**: Burrows' Delta, Cosine Delta, Zeta scores
- **N-gram Analysis**: Character and word bigram entropy, perplexity

## Installation

Install only what you need:

```bash
# Core package (lexical metrics only)
pip install pystylometry

# With readability metrics
pip install pystylometry[readability]

# With syntactic metrics (requires spaCy)
pip install pystylometry[syntactic]

# With authorship metrics
pip install pystylometry[authorship]

# With n-gram analysis
pip install pystylometry[ngrams]

# Everything
pip install pystylometry[all]
```

## Quick Start

### Using Individual Modules

```python
from pystylometry.lexical import compute_mtld, compute_yule
from pystylometry.readability import compute_flesch

text = "Your text here..."

# Lexical diversity
mtld = compute_mtld(text)
print(f"MTLD: {mtld.mtld_average:.2f}")

yule = compute_yule(text)
print(f"Yule's K: {yule.yule_k:.2f}")

# Readability
flesch = compute_flesch(text)
print(f"Reading Ease: {flesch.reading_ease:.1f}")
print(f"Grade Level: {flesch.grade_level:.1f}")
```

### Using the Unified API

```python
from pystylometry import analyze

text = "Your text here..."

# Analyze with multiple metrics at once
results = analyze(text, lexical=True, readability=True)

# Access results
print(f"MTLD: {results.lexical['mtld'].mtld_average:.2f}")
print(f"Flesch: {results.readability['flesch'].reading_ease:.1f}")
```

### Checking Available Modules

```python
from pystylometry import get_available_modules

available = get_available_modules()
print(available)
# {'lexical': True, 'readability': True, 'syntactic': False, ...}
```

## API Design

### Clean, Consistent Interface

Every metric function:
- Takes text as input
- Returns a rich result object (never just a float)
- Includes metadata about the computation
- Has comprehensive docstrings with formulas and references

```python
from pystylometry.lexical import compute_yule

result = compute_yule(text)
# Returns: YuleResult(yule_k=..., yule_i=..., metadata={...})
```

## Available Metrics

### Lexical Diversity
- **TTR** - Type-Token Ratio (via stylometry-ttr)
- **MTLD** - Measure of Textual Lexical Diversity
- **Yule's K** - Vocabulary repetitiveness
- **Hapax Legomena** - Words appearing once/twice
- **Sichel's S** - Hapax-based richness
- **Honoré's R** - Vocabulary richness constant

### Readability
- **Flesch Reading Ease** - 0-100 difficulty scale
- **Flesch-Kincaid Grade** - US grade level
- **SMOG Index** - Years of education needed
- **Gunning Fog** - Readability complexity
- **Coleman-Liau** - Character-based grade level
- **ARI** - Automated Readability Index

### Syntactic (requires spaCy)
- **POS Ratios** - Noun/verb/adjective/adverb ratios
- **Lexical Density** - Content vs function words
- **Sentence Statistics** - Length, variation, complexity

### Authorship (requires scikit-learn, scipy)
- **Burrows' Delta** - Author distance measure
- **Cosine Delta** - Angular distance
- **Zeta Scores** - Distinctive word usage

### N-grams (requires nltk)
- **Character Bigram Entropy** - Character predictability
- **Word Bigram Entropy** - Word sequence predictability
- **Perplexity** - Language model fit

## Dependencies

**Core (always installed):**
- stylometry-ttr

**Optional:**
- `readability`: pronouncing (for syllable counting)
- `syntactic`: spacy>=3.8.0
- `authorship`: None (pure Python + stdlib)
- `ngrams`: None (pure Python + stdlib)

## Development

```bash
# Clone the repository
git clone https://github.com/craigtrim/pystylometry
cd pystylometry

# Install with dev dependencies
pip install -e ".[dev,all]"

# Run tests
make test

# Run linters
make lint

# Format code
make format
```

## Project Status

🚧 **Phase 1 - Core Lexical Metrics** (In Progress)
- [x] Project structure
- [ ] MTLD implementation
- [ ] Yule's K implementation
- [ ] Hapax ratios implementation
- [ ] Tests
- [ ] v0.1.0 release

See [pystylometry-plan.md](.claude/context/pystylometry-plan.md) for the full roadmap.

## Why pystylometry?

- **Modular**: Install only what you need
- **Consistent**: Uniform API across all metrics
- **Rich Results**: Dataclass objects with metadata, not just numbers
- **Well-Documented**: Formulas, references, and interpretations
- **Type-Safe**: Full type hints for IDE support
- **Tested**: Comprehensive test suite

## References

See [stylometry-metrics.md](.claude/context/stylometry-metrics.md) for the complete metrics reference table with formulas.

## License

MIT License - see LICENSE file for details.

## Author

Craig Trim (craigtrim@gmail.com)

## Contributing

Contributions welcome! Please open an issue or PR on GitHub.

