Metadata-Version: 2.4
Name: mowen
Version: 2.1.0
Summary: Authorship attribution toolkit — stylometric analysis with a configurable pipeline
Project-URL: Homepage, https://github.com/jnoecker/mowen
Project-URL: Repository, https://github.com/jnoecker/mowen
Project-URL: Issues, https://github.com/jnoecker/mowen/issues
Author: John Noecker Jr
License-Expression: MIT
Keywords: authorship-attribution,forensic-linguistics,machine-learning,nlp,stylometry,text-analysis
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Typing :: Typed
Requires-Python: >=3.12
Provides-Extra: all
Requires-Dist: beautifulsoup4>=4.12; extra == 'all'
Requires-Dist: jieba>=0.42; extra == 'all'
Requires-Dist: nltk>=3.8; extra == 'all'
Requires-Dist: pdfplumber>=0.10; extra == 'all'
Requires-Dist: python-docx>=1.1; extra == 'all'
Requires-Dist: spacy>=3.7; extra == 'all'
Requires-Dist: torch>=2.1; extra == 'all'
Requires-Dist: transformers>=4.36; extra == 'all'
Provides-Extra: chinese
Requires-Dist: jieba>=0.42; extra == 'chinese'
Provides-Extra: dev
Requires-Dist: httpx>=0.27; extra == 'dev'
Requires-Dist: mypy>=1.8; extra == 'dev'
Requires-Dist: pytest-cov>=4.1; extra == 'dev'
Requires-Dist: pytest>=7.4; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Provides-Extra: docx
Requires-Dist: python-docx>=1.1; extra == 'docx'
Provides-Extra: html
Requires-Dist: beautifulsoup4>=4.12; extra == 'html'
Provides-Extra: nlp
Requires-Dist: spacy>=3.7; extra == 'nlp'
Provides-Extra: pdf
Requires-Dist: pdfplumber>=0.10; extra == 'pdf'
Provides-Extra: transformers
Requires-Dist: torch>=2.1; extra == 'transformers'
Requires-Dist: transformers>=4.36; extra == 'transformers'
Provides-Extra: wordnet
Requires-Dist: nltk>=3.8; extra == 'wordnet'
Description-Content-Type: text/markdown

# mowen

Core Python library for authorship attribution.

This package provides the pipeline engine, all built-in components (canonicizers,
event drivers, event cullers, distance functions, analysis methods), evaluation
utilities (cross-validation, metrics), and the tokenizer framework.

## Install

```bash
pip install -e .            # core only (no optional deps)
pip install -e '.[nlp]'     # + spaCy POS/NER
pip install -e '.[all]'     # everything
```

## Usage

```python
from mowen import Pipeline, PipelineConfig, Document, leave_one_out

# Attribution
results = Pipeline(config).execute(known_docs, unknown_docs)

# Cross-validation
eval_result = leave_one_out(docs, config)
print(eval_result.accuracy, eval_result.macro_f1)
```

See the [root README](../README.md) for full documentation.
