Metadata-Version: 2.4
Name: pysofra
Version: 0.1.0a1
Summary: Statistical reporting and table preparation framework for Python — the missing reporting layer.
Project-URL: Homepage, https://github.com/jturner-uofl/pysofra
Project-URL: Documentation, https://github.com/jturner-uofl/pysofra
Project-URL: Repository, https://github.com/jturner-uofl/pysofra
Project-URL: Issues, https://github.com/jturner-uofl/pysofra/issues
Author-email: Jason Turner <jason.s.turner@gmail.com>
License: GPL-3.0-or-later
License-File: LICENSE
License-File: NOTICE
Keywords: biostatistics,clinical-trials,epidemiology,flextable,gtsummary,publication,reporting,statistics,tableone,tables
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Healthcare Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Requires-Python: >=3.11
Requires-Dist: numpy>=1.24
Requires-Dist: pandas<3,>=2.0
Requires-Dist: python-docx>=1.1
Requires-Dist: scipy>=1.11
Requires-Dist: statsmodels>=0.14
Provides-Extra: all
Requires-Dist: lifelines<0.31,>=0.27; extra == 'all'
Requires-Dist: matplotlib>=3.8; extra == 'all'
Requires-Dist: polars>=0.20; extra == 'all'
Requires-Dist: pyarrow>=15; extra == 'all'
Requires-Dist: python-pptx>=1.0; extra == 'all'
Requires-Dist: scikit-learn>=1.3; extra == 'all'
Requires-Dist: xlsxwriter>=3.2; extra == 'all'
Provides-Extra: dev
Requires-Dist: hypothesis>=6.100; extra == 'dev'
Requires-Dist: mypy>=1.8; extra == 'dev'
Requires-Dist: openpyxl>=3.1; extra == 'dev'
Requires-Dist: pandas-stubs>=2.2; extra == 'dev'
Requires-Dist: pytest-cov>=4.1; extra == 'dev'
Requires-Dist: pytest>=7.4; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.5; extra == 'docs'
Requires-Dist: mkdocs>=1.5; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.24; extra == 'docs'
Provides-Extra: plot
Requires-Dist: matplotlib>=3.8; extra == 'plot'
Provides-Extra: polars
Requires-Dist: polars>=0.20; extra == 'polars'
Requires-Dist: pyarrow>=15; extra == 'polars'
Provides-Extra: pptx
Requires-Dist: python-pptx>=1.0; extra == 'pptx'
Provides-Extra: sklearn
Requires-Dist: scikit-learn>=1.3; extra == 'sklearn'
Provides-Extra: survival
Requires-Dist: lifelines<0.31,>=0.27; extra == 'survival'
Requires-Dist: matplotlib>=3.8; extra == 'survival'
Provides-Extra: xlsx
Requires-Dist: xlsxwriter>=3.2; extra == 'xlsx'
Description-Content-Type: text/markdown

<div align="center">

# PySofra

### The missing statistical reporting layer for Python

[![Coverage](https://img.shields.io/badge/coverage-100%25-brightgreen.svg)](https://github.com/jturner-uofl/pysofra)
[![Python](https://img.shields.io/badge/python-3.11%20%7C%203.12%20%7C%203.13-blue.svg)](https://www.python.org/downloads/)
[![License: GPL-3.0+](https://img.shields.io/badge/license-GPL--3.0--or--later-blue.svg)](LICENSE)
[![Style: ruff](https://img.shields.io/badge/style-ruff-purple.svg)](https://github.com/astral-sh/ruff)
[![Types: mypy strict](https://img.shields.io/badge/types-mypy%20strict-blue.svg)](http://mypy-lang.org/)
[![Tests: 886](https://img.shields.io/badge/tests-886%20passing-brightgreen.svg)](#status)

</div>

> PySofra turns datasets, fitted models, and summary statistics into
> **publication-ready tables** — across HTML · Markdown · LaTeX · DOCX ·
> PPTX · XLSX · PNG — from a single immutable `SofraTable` object. It
> brings the practical workflows of R's `tableone`, `gtsummary`, and
> `flextable` into a single coherent Pythonic API.

<div align="center">
  <img src="https://raw.githubusercontent.com/jturner-uofl/pysofra/main/assets/readme/table_one.png" alt="Baseline characteristics table by treatment arm — JAMA theme, with p-values, standardized mean differences, and Overall column" width="820">
  <br>
  <sub><em>Baseline characteristics, by treatment arm. <strong>One line of code.</strong></em></sub>
</div>

<table>
<tr>
<td width="50%" valign="top" align="center">
  <img src="https://raw.githubusercontent.com/jturner-uofl/pysofra/main/assets/readme/regression_forest.png" alt="Adjusted odds ratios with inline forest plot" width="100%">
  <br>
  <sub><em>Adjusted ORs + inline forest plot</em><br><code>tbl_regression(fit).with_forest_plot()</code></sub>
</td>
<td width="50%" valign="top" align="center">
  <img src="https://raw.githubusercontent.com/jturner-uofl/pysofra/main/assets/readme/survival_km.png" alt="Kaplan-Meier survival table with embedded KM curve" width="100%">
  <br>
  <sub><em>KM table + inline survival curve</em><br><code>tbl_survival(...).with_km_plot()</code></sub>
</td>
</tr>
</table>

### Why PySofra

- **One immutable object, seven output formats** — build a `SofraTable` once, render to HTML / Markdown / LaTeX / DOCX / PPTX / XLSX / PNG, all byte-deterministic across processes
- **Auto-dispatched statistical tests** — Welch, Wilcoxon, ANOVA, Kruskal–Wallis, Fisher, χ², Rao–Scott, design-adjusted *t* — picked by variable kind, overridable per-row
- **Inline forest plots and KM curves** — embed matplotlib figures directly into the table; the same `SofraTable` renders them across every backend
- **Statistically correct** — every numeric output validated against `scipy` / `statsmodels` / `lifelines` at machine precision (and cross-checked against R's `gtsummary` for the JSS paper)
- **Method-chainable and immutable** — every modifier returns a new table; no in-place mutation, no global state, fully reproducible

<div align="center">

**[Showcase notebook](examples/pysofra_showcase.ipynb)** · [rendered HTML](examples/pysofra_showcase.html) — *47 cells, every section a side-by-side numeric proof. Start here if you have 60 seconds.*

**[End-to-end tutorial](examples/pysofra_tutorial.ipynb)** · [rendered HTML](examples/pysofra_tutorial.html) — *126 cells walking every public feature on a synthetic two-arm trial.*

</div>

---

## Quick start

```python
import numpy as np
import pandas as pd
import pysofra as ps

# Toy two-arm trial; replace with your own DataFrame in real use.
rng = np.random.default_rng(0)
df = pd.DataFrame({
    "arm":   rng.choice(["Placebo", "Treatment"], 200),
    "age":   rng.normal(60, 10, 200).round(),
    "bmi":   rng.normal(28, 5, 200).round(1),
    "event": rng.binomial(1, 0.3, 200),
})

# Table 1 — baseline characteristics by treatment arm
tbl = (
    ps.tbl_one(df, by="arm")
      .add_p()
      .add_smd()
      .add_overall()
      .theme("clinical")
)

tbl                          # renders in Jupyter / Colab / VS Code
tbl.to_docx("table1.docx")   # publication-quality Word
tbl.to_html()                # standalone HTML fragment
tbl.to_markdown()            # GitHub-flavored Markdown
```

The same workflow handles regression tables:

```python
import statsmodels.api as sm

X = sm.add_constant(df[["age", "bmi"]])
fit = sm.Logit(df["event"], X).fit(disp=False)

(
    ps.tbl_regression(fit, exponentiate=True)
      .bold_p()
      .theme("jama")
      .to_docx("table2.docx")
)
```

The full worked example from the JSS paper — baseline table by
treatment arm, regression table with forest plot, and Kaplan-Meier
survival summary — is in
[`paper/replication/example_trial.py`](paper/replication/example_trial.py).

---

## What's in the box

| Feature              | Function / object         | Status |
|----------------------|---------------------------|--------|
| Baseline Table 1     | `ps.tbl_one`              | MVP    |
| Descriptive summary  | `ps.tbl_summary`          | MVP    |
| Regression results   | `ps.tbl_regression`       | MVP    |
| Side-by-side merge   | `ps.tbl_merge`            | MVP    |
| Vertical stack       | `ps.tbl_stack`            | MVP    |
| HTML / Markdown      | `.to_html` / `.to_markdown` | MVP  |
| DOCX export          | `.to_docx`                | MVP    |
| LaTeX export         | `.to_latex`               | MVP    |
| PPTX export          | `.to_pptx`                | MVP (extras) |
| Excel export         | `.to_xlsx`                | MVP    |
| Inline forest plots  | `tbl_regression(...).with_forest_plot()` | MVP |
| Inline KM curves     | `tbl_survival(...).with_km_plot()` | MVP |
| Cross-backend plot embedding | DOCX/PPTX/LaTeX include the plot too | MVP |
| Rao–Scott chi-square | weighted Table 1 auto-route | MVP |
| `SurveyDesign` (strata + cluster + FPC) | Taylor-linearised variance | MVP |
| Themes               | `clinical`, `jama`, `nejm`, `compact`, `minimal` | MVP |
| Auto test selection  | t-test / ANOVA / Wilcoxon / Kruskal / χ² / Fisher | MVP |
| Per-variable test overrides | `tests={'age': 'wilcoxon', ...}` | MVP |
| Multiplicity adjustment | `.add_q()` — BH, BY, Bonferroni, Holm, Hommel, Šidák | MVP |
| Multi-model regression | `tbl_regression([m1, m2], model_labels=[...])` | MVP |
| lifelines (Cox / AFT) | `tbl_regression(cph)` | MVP |
| sklearn (linear models) | `tbl_regression(clf)` — point estimates only | MVP |
| Kaplan–Meier summary | `tbl_survival(df, time=, event=, by=, times=[...])` | MVP |
| Survey-weighted Table 1 | `tbl_one(..., weights='w')` | MVP |
| polars input | `tbl_one(pl.DataFrame(...))` | MVP |
| Conditional formatting | `.bold_if`, `.highlight_if`, `.style_if` | MVP |
| Sticky-header notebook tables | `.to_html(sticky_header=True)` | MVP |
| Standardised mean differences | continuous + categorical (Yang–Dalton) | MVP |
| Notebook rendering   | `_repr_html_` / `_repr_markdown_` / `_repr_latex_` | MVP    |

---

## Design principles

* **Backend-agnostic tables.** A `SofraTable` is the single source of truth;
  every renderer (HTML, Markdown, DOCX, …) reads the same object.
* **Immutable method chaining.** Every modifier returns a new `SofraTable`.
  No surprises, no global state.
* **Strong defaults, explicit overrides.** Sensible journal-style output
  out of the box; per-variable type, label, and test overrides when you
  need them.
* **Deterministic.** The same input always produces the same output —
  critical for reproducible research.
* **No magic.** No nonstandard evaluation, no metaprogramming, no
  network calls, no telemetry.

---

## Installation

```bash
pip install pysofra
```

PySofra requires Python ≥ 3.11. The core install only pulls `numpy`,
`pandas`, `scipy`, `statsmodels`, and `python-docx`. Domain extras unlock
the features that depend on heavier optional libraries:

```bash
pip install "pysofra[survival]"   # tbl_survival + KM curves (lifelines, matplotlib)
pip install "pysofra[plot]"       # forest plots, table-as-image (matplotlib)
pip install "pysofra[pptx]"       # PowerPoint export (python-pptx)
pip install "pysofra[xlsx]"       # Excel export (xlsxwriter)
pip install "pysofra[polars]"     # accept polars DataFrames as input
pip install "pysofra[sklearn]"    # tbl_regression on scikit-learn models
pip install "pysofra[all]"        # everything above
pip install "pysofra[dev]"        # testing + linting (pytest, ruff, mypy, hypothesis)
```

---

## Status

PySofra is in **alpha** (`0.1.0a1`). The public API surface is pinned
by an explicit
[API-stability test](tests/test_joss_api_stability.py) so that any
unintended rename, removal, or signature change surfaces as a failed
test. Quality bar at this release:

* **More than 800 tests passing**, **100% line coverage**, mypy strict, ruff clean.
* Every numeric output is validated against `scipy`, `lifelines`,
  `statsmodels`, or a hand-computed textbook formula
  ([test_joss_statistical_correctness.py](tests/test_joss_statistical_correctness.py)).
* Universal invariants enforced via Hypothesis on 720 randomized
  examples per CI run
  ([test_joss_property_invariants.py](tests/test_joss_property_invariants.py)).
* Renderer output is byte-deterministic — identical input always
  produces identical HTML/Markdown/LaTeX, required for reproducible
  publication artifacts
  ([test_joss_renderer_consistency.py](tests/test_joss_renderer_consistency.py)).

Bug reports and use-case feedback are very welcome.

A **Journal of Statistical Software** paper ([`paper/paper.tex`](paper/paper.tex),
bibliography in [`paper/paper.bib`](paper/paper.bib)) is in
preparation. The [`paper/replication/`](paper/replication/) directory
regenerates every numeric output and figure shown in the paper and
includes an R cross-check script (`example_trial.R`) using
`gtsummary` for digit-level verification. The
[`paper/figures/`](paper/figures/) directory holds the rendered PDF
and PNG embedded in the manuscript via `\includegraphics{}`.

## Contributing

Bug reports, feature requests, and pull requests are all very welcome.
Please read [`CONTRIBUTING.md`](CONTRIBUTING.md) for the workflow, the
quality gates, and the
[Code of Conduct](CODE_OF_CONDUCT.md).

## License

GPL-3.0-or-later. See [`LICENSE`](LICENSE).

## Citation

If you use PySofra in academic work, please cite the project — see
[`CITATION.cff`](CITATION.cff).
