Metadata-Version: 2.4
Name: pythodds
Version: 0.9.1
Summary: Command line utilities for statistics, odds, and probabilities
Author-email: Nicholas Carsner <nicholascarsner@gmail.com>
License: MIT
License-File: LICENSE
Keywords: binomial,cli,odds,probability,statistics
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Python: >=3.13
Requires-Dist: numpy>=2.4.3
Requires-Dist: pandas>=3.0.1
Requires-Dist: scikit-learn>=1.8.0
Requires-Dist: scipy>=1.17.1
Description-Content-Type: text/markdown

# pythodds

[![PyPI version](https://badge.fury.io/py/pythodds.svg)](https://pypi.org/project/pythodds/)
[![Python 3.13+](https://img.shields.io/badge/python-3.13+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/ncarsner/pythodds/blob/main/LICENSE)
[![Tests](https://github.com/ncarsner/pythodds/actions/workflows/tests.yml/badge.svg)](https://github.com/ncarsner/pythodds/actions/workflows/tests.yml)
[![codecov](https://codecov.io/gh/ncarsner/pythodds/branch/main/graph/badge.svg)](https://codecov.io/gh/ncarsner/pythodds)

A command-line utility and Python library for calculating statistics, odds, and probabilities.

## Features

| | |
|-|-|
| **Binomial Distribution** |  Calculate PMF, CDF, and survival functions for binomial distributions |
| **Bayes' Theorem** | Compute posterior probabilities from priors, likelihoods, and either direct evidence or a false-positive rate |
| **Birthday Problem** | Compute collision probabilities for uniform and non-uniform pools, find minimum group sizes, and generate probability tables |
| **Normal Distribution** | Compute PDF, CDF, survival probabilities, interval probabilities, and the inverse CDF (percent-point function) for a Gaussian N(μ, σ²) distribution |
| **Expected Value** | Compute E[X], Var(X), SD(X), Shannon entropy, and the moment generating function for discrete probability distributions; supports inline input or CSV/JSON files |
| **Poisson Distribution** | Compute PMF, CDF, and survival probabilities, find minimum event counts for a target cumulative probability, and generate full probability tables |
| **Streak Probability** | Compute the probability of at least one consecutive run of successes and the expected length of the longest streak |
| **Pythagorean Record** | Calculate team winning percentage expectations using Bill James' Pythagorean formula or the SABR linear formula; project in-progress season records and compare actual vs. expected performance |
| **Pearson Correlation** | Compute Pearson's r, r², t-statistic, p-value, and confidence intervals; test for linear relationships between two continuous variables; supports inline data or CSV input |
| **Monte Carlo Simulator** | Empirically estimate probabilities for binomial, birthday, streak, and Poisson experiments with confidence intervals and analytical comparison |

| **Command-line Interface** | `binom`, `bayes`, `birthday`, `normal`, `expected`, `poisson`, `streak`, `pythag`, `pearson`, and `simulate` commands |
| **Pure Python** | No external dependencies required for core calculations |

## Installation

Install from PyPI:

```bash
pip install pythodds
```

Or install from source:

```bash
git clone https://github.com/ncarsner/pythodds.git
cd pythodds
pip install -e .
```

## Command Line Usage

#### `binom` — Binomial Distribution

Computes exact, cumulative, and survival probabilities for a Binomial(n, p) distribution, and renders a color-coded stacked progress bar showing the share of mass below `k`, at `k`, and above `k`.

```bash
# Calculate binomial distribution probabilities
binom -n 10 -k 3 -p 0.4

# Specify a target and minimum probability threshold
binom -n 100 -k 30 -p 0.35 --target 40 --min-prob 0.05
```

Typical output includes a stacked terminal bar like this:

```text
n=10, k=3, p=0.4
P(X = 3):  0.214991 (21.499100%)
P(X <= 3): 0.382281 (38.228100%)
P(X >= 3): 0.832710 (83.271000%)
[stacked ANSI bar for <k | =k | >k]
```
---
#### `bayes` — Bayes' Theorem Posterior Probability

Computes posterior probability $P(A\mid B)$ from a prior probability, likelihood, and either direct evidence $P(B)$ or a false-positive rate $P(B\mid \neg A)$.

```bash
# Medical test example: prevalence 1%, sensitivity 99%, false-positive rate 5%
bayes -p 0.01 -l 0.99 -f 0.05

# Provide evidence directly instead of a false-positive rate
bayes -p 0.2 -l 0.8 -e 0.5
```

**Options:**

| Flag | Long form | Description |
|------|-----------|-------------|
| `-p` | `--prior` | Prior probability $P(A)$, between `0` and `1` |
| `-l` | `--likelihood` | Likelihood $P(B\mid A)$, between `0` and `1` |
| `-e` | `--evidence` | Evidence probability $P(B)$, between `0` and `1` |
| `-f` | `--false-positive` | False-positive rate $P(B\mid \neg A)$, between `0` and `1` |
| `-P` | `--precision` | Decimal places for printed values (default: `6`) |

> `-e/--evidence` and `-f/--false-positive` are mutually exclusive; one is required.

---
#### `birthday` — Birthday Problem Collision Probability

Computes the probability that at least two items in a group share the same value when drawn from a pool of equally-likely possibilities. Defaults to a pool size of 365.25 (calendar days).

```bash
# P(duplicate birthday) in a group of 23 people
birthday -n 23

# Find the minimum group size to reach 50% collision probability
birthday --target-prob 0.50

# Print a probability table for group sizes 1–40
birthday --range 1 40

# Custom pool size (e.g. 7-digit phone numbers)
birthday -p 10_000_000 -n 1180

# Non-uniform pool via relative weights
birthday --group-size 30 --weights 0.10,0.15,0.20,0.30,0.25

# Output as JSON or CSV
birthday --range 1 60 --format <json|csv>
```

**Options:**

| Flag | Long form | Description |
|------|-----------|-------------|
| `-p` | `--pool-size` | Pool size — number of equally-likely outcomes (default: `365.25`) |
| `-n` | `--group-size` | Compute collision probability for exactly this group size |
| `-t` | `--target-prob` | Find the minimum group size reaching this probability |
| `-r` | `--range MIN MAX` | Print a probability table for group sizes MIN through MAX |
| `-w` | `--weights` | Comma-separated relative frequencies for a non-uniform pool |
| `-f` | `--format` | Output format: `table` (default), `json`, or `csv` |
| `-P` | `--precision` | Decimal places for printed probabilities (default: `6`) |

---
#### `normal` — Normal (Gaussian) Distribution

Computes PDF, CDF, survival probabilities, interval probabilities, and the inverse CDF (percent-point function) for a N(μ, σ²) distribution. Uses only the Python standard library.

```bash
# PDF, P(X ≤ 1.96), and P(X ≥ 1.96) for the standard normal
normal -x 1.96 -m 0 -s 1

# Same calculation for a custom distribution
normal -x 75 -m 70 -s 5

# P(−1.96 ≤ X ≤ 1.96)
normal --between -1.96 1.96 -m 0 -s 1

# Find the value x such that P(X ≤ x) = 0.975 (inverse CDF)
normal --quantile 0.975 -m 0 -s 1
```

**Options:**

| Flag | Long form | Description |
|------|-----------|-------------|
| `-x` | `--value` | Compute PDF, P(X ≤ x), and P(X ≥ x) for this value |
| | `--between LOW HIGH` | Compute P(LOW ≤ X ≤ HIGH) |
| `-q` | `--quantile` | Find x such that P(X ≤ x) = P (inverse CDF) |
| `-m` | `--mean` | Distribution mean μ (default: `0`) |
| `-s` | `--std` | Distribution standard deviation σ (default: `1`) |
| `-P` | `--precision` | Decimal places for printed values (default: `6`) |

> `-x/--value`, `--between`, and `-q/--quantile` are mutually exclusive; one is required.

---
#### `expected` — Expected Value & Discrete Distribution Statistics

Computes E[X], Var(X), SD(X), Shannon entropy, and optionally the moment generating function (MGF) for a discrete probability distribution supplied inline or via a CSV/JSON file.

```bash
# E[X] and statistics for a simple discrete distribution
expected --outcomes 0,1,5,10 --probs 0.50,0.25,0.15,0.10

# Non-uniform six-sided die
expected --outcomes 1,2,3,4,5,6 --probs 0.1,0.2,0.3,0.2,0.1,0.1

# Load distribution from a CSV or JSON file
expected --file payouts.csv

# Also compute the MGF at t=0.5
expected --outcomes 0,1 --probs 0.3,0.7 --mgf 0.5
```

**Options:**

| Flag | Long form | Description |
|------|-----------|-------------|
| `-o` | `--outcomes` | Comma-separated outcome values |
| `-f` | `--file` | CSV or JSON file with outcomes and probabilities |
| `-p` | `--probs` | Comma-separated probabilities (required with `--outcomes`) |
| | `--mgf T` | Also compute the moment generating function M_X(t) at t=T |
| `-P` | `--precision` | Decimal places for printed values (default: `6`) |

> `--outcomes` and `--file` are mutually exclusive; one is required. `--probs` is required when using `--outcomes`.

---
#### `poisson` — Poisson Distribution

Computes PMF, CDF, and survival probabilities for a Poisson(λ) distribution. Models rare, independent events occurring at a known average rate — server errors per hour, calls per minute, defects per batch, and so on.

```bash
# P(X=7), P(X≤7), and P(X≥7) for λ=3.0
poisson -l 3.0 -k 7

# Find the minimum k such that P(X ≤ k) >= 0.95
poisson -l 3.0 -t 0.95

# Print a probability table for k = 0 through 15
poisson -l 3.0 -r 0 15

# Also show P(X ≥ 5) and whether it meets a 1% threshold
poisson -l 0.5 -k 2 --target 5 --min-prob 0.01

# Output as JSON or CSV
poisson -l 3.0 -r 0 20 -f json
poisson -l 3.0 -r 0 20 -f csv
```

**Options:**

| Flag | Long form | Description |
|------|-----------|-------------|
| `-l` | `--rate` | Average event rate λ (required, must be > 0) |
| `-k` | `--events` | Compute PMF and CDF for exactly this event count |
| `-t` | `--target-prob` | Find the minimum k such that P(X ≤ k) ≥ PROB |
| `-r` | `--range MIN MAX` | Print a probability table for event counts MIN through MAX |
| | `--target` | With `-k`: also print P(X ≥ T) for this target count |
| | `--min-prob` | With `--target`: report whether P(X ≥ T) meets this threshold |
| `-f` | `--format` | Output format: `table` (default), `json`, or `csv` |
| `-P` | `--precision` | Decimal places for printed probabilities (default: `6`) |

---
#### `streak` — Streak / Consecutive Run Probability

Computes the exact probability of at least one run of k consecutive successes in n independent Bernoulli trials, and the expected length of the longest run. Uses dynamic programming for exact O(n·k) computation.

```bash
# P(at least one run of 5+ heads in 100 fair coin flips)
streak -n 100 -k 5 -p 0.5

# P(at least one hitting streak of 10+ games over a 162-game season at .320)
streak -n 162 -k 10 -p 0.32

# Expected length of the longest win streak in 50 trials at 40% success rate
streak -n 50 -p 0.40 --longest
```

**Options:**

| Flag | Long form | Description |
|------|-----------|-------------|
| `-n` | `--trials` | Total number of independent trials (required) |
| `-p` | `--prob` | Success probability per trial, 0–1 (required) |
| `-k` | `--streak-length` | Compute P(at least one run of K consecutive successes) |
| | `--longest` | Compute E[length of longest run of consecutive successes] |
| `-P` | `--precision` | Decimal places for printed probabilities (default: `6`) |

> `-k/--streak-length` and `--longest` are mutually exclusive; one is required.

---
#### `pythag` — Pythagorean Record / Win Expectation

Calculates expected winning percentage for sports teams based on runs/points scored and allowed. Supports both the traditional Pythagorean formula (Bill James) and the newer linear formula from SABR research (Rothman, 2014). Can project final season records for teams in progress.

```bash
# MLB team using linear formula (default)
pythag --scored 800 --allowed 650

# Compare both linear and Pythagorean methods
pythag --scored 800 --allowed 650 --method both

# Use traditional Pythagorean formula with custom exponent
pythag --scored 800 --allowed 650 --method pythagorean --exponent 1.83

# NFL team projection
pythag --scored 420 --allowed 300 --sport nfl

# NBA team projection
pythag --scored 8500 --allowed 8200 --sport nba

# In-progress season: team is 45-37 after 82 games (shows projection)
pythag --scored 550 --allowed 490 --current-wins 45 --games-played 82
```

**Options:**

| Flag | Long form | Description |
|------|-----------|-------------|
| `-s` | `--scored` | Runs/points scored by the team (required) |
| `-a` | `--allowed` | Runs/points allowed by the team (required) |
| | `--sport` | Sport/league: `mlb` (default), `nfl`, or `nba` |
| `-m` | `--method` | Calculation method: `linear` (default), `pythagorean`, or `both` |
| `-e` | `--exponent` | Exponent for Pythagorean formula (default: `2.0`; optimal ~1.83 for baseball) |
| `-g` | `--games` | Total games in season (default: 162 for mlb, 17 for nfl, 82 for nba) |
| `-w` | `--current-wins` | Current wins (for in-progress season projection) |
| `-p` | `--games-played` | Games already played (for in-progress season projection) |
| `-P` | `--precision` | Decimal places for percentages (default: `2`) |

> `--current-wins` and `--games-played` must be used together for in-progress projections.

**Formulas:**
- **Pythagorean (James):** `EXP(W%) = RS^exp / (RS^exp + RA^exp)`
- **Linear (Rothman, 2014):**
  - MLB: `EXP(W%) = 0.000683(RS - RA) + 0.50`
  - NFL: `EXP(W%) = 0.001538(PS - PA) + 0.50`
  - NBA: `EXP(W%) = 0.000351(PS - PA) + 0.50`

---
#### `pearson` — Pearson Correlation Coefficient

Computes the Pearson correlation coefficient (r) to measure the linear relationship between two continuous variables. Values range from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear relationship. Includes hypothesis testing, p-values, and confidence intervals using Fisher's Z-transformation.

```bash
# Compute correlation from command-line values
pearson --x 1,2,3,4,5 --y 2.1,3.8,6.2,7.9,10.1

# Load data from CSV file
pearson --file data.csv --x-col height --y-col weight

# Include hypothesis test at α=0.05 significance level
pearson --x 10,20,30,40,50 --y 15,28,41,55,68 --alpha 0.05

# One-tailed test for positive correlation
pearson --x 1,2,3,4,5 --y 2,4,5,4,5 --alpha 0.05 --sided one

# Custom precision for output
pearson --x 1,2,3,4 --y 2,4,6,8 --precision 4
```

**Options:**

| Flag | Long form | Description |
|------|-----------|-------------|
| | `--x` | Comma-separated x values (use with `--y`) |
| | `--y` | Comma-separated y values (required with `--x`) |
| | `--file` | CSV file path (use with `--x-col` and `--y-col`) |
| | `--x-col` | Column name for x values in CSV (required with `--file`) |
| | `--y-col` | Column name for y values in CSV (required with `--file`) |
| | `--alpha` | Significance level for hypothesis test and CI (e.g., `0.05`) |
| | `--sided` | Hypothesis test type: `one` or `two` (default: `two`) |
| `-P` | `--precision` | Decimal places for printed values (default: `6`) |

> `--x` and `--file` are mutually exclusive; one is required. When using `--x`, must also provide `--y`. When using `--file`, must also provide `--x-col` and `--y-col`.

**Output includes:**
- **Pearson's r**: Correlation coefficient measuring linear relationship strength
- **r²**: Coefficient of determination (proportion of variance explained)
- **Interpretation**: Qualitative description of correlation strength and direction
- **Hypothesis test** (if `--alpha` provided): t-statistic, p-value, significance result
- **Confidence interval** (if `--alpha` provided): CI for population correlation ρ using Fisher Z-transformation

---
#### `simulate` — Monte Carlo Probability Simulator

Runs repeated random experiments to estimate probabilities empirically, with optional confidence intervals and analytical comparison against `binom`, `birthday`, `poisson`, and `streak`.

```bash
# Estimate P(X >= 5) for Binomial(n=10, p=0.4) over 100,000 trials
simulate --experiment binom --params n=10 k=5 p=0.4 --trials 100000

# Birthday collision probability for a group of 23 with a 95% confidence interval
simulate --experiment birthday --params pool=365 group=23 --confidence

# Streak probability: P(run of 5+ successes in 100 trials, p=0.5)
simulate --experiment streak --params n=100 k=5 p=0.5 --trials 50000

# Poisson: P(X >= 7) for λ=3.0 with a fixed seed
simulate --experiment poisson --params lam=3.0 k=7 --seed 42

# Auto-size trial count to achieve a target standard error of 0.005
simulate --experiment binom --params n=20 k=8 p=0.5 --scale 0.005
```

**Options:**

| Flag | Long form | Description |
|------|-----------|-------------|
| `-e` | `--experiment` | Experiment type: `binom`, `birthday`, `streak`, or `poisson` (required) |
| `-p` | `--params` | Space-separated `KEY=VALUE` experiment parameters (see below) |
| `-t` | `--trials` | Number of simulation trials (default: 10,000) |
| | `--scale` | Target standard error; auto-computes `--trials` (overrides `-t`) |
| `-s` | `--seed` | Random seed for reproducibility |
| `-c` | `--confidence` | Print 95% Wilson confidence interval |
| | `--dump` | Output per-trial results as CSV instead of summary |
| `-f` | `--format` | Summary output format: `table` (default) or `json` |
| `-P` | `--precision` | Decimal places for printed probabilities (default: `6`) |

**Required params by experiment:**

| Experiment | Required params |
|------------|-----------------|
| `binom` | `n=INT k=INT p=FLOAT` |
| `birthday` | `pool=INT group=INT` |
| `streak` | `n=INT k=INT p=FLOAT` |
| `poisson` | `lam=FLOAT k=INT` |

---
### Python Library

#### Binomial Distribution

```python
from src.utils.binomial_distribution import binomial_pmf, binomial_cdf_le, binomial_cdf_ge

# P(X = 3) for Binomial(n=10, p=0.4)
pmf = binomial_pmf(10, 3, 0.4)

# P(X <= 3) for Binomial(n=10, p=0.4)
cdf = binomial_cdf_le(10, 3, 0.4)

# P(X >= 3) for Binomial(n=10, p=0.4)
survival = binomial_cdf_ge(10, 3, 0.4)
```

#### Bayes' Theorem

```python
from src.utils.bayes_theorem import bayes_posterior, evidence_from_false_positive

# Derive P(B) from a prior, likelihood, and false-positive rate
evidence = evidence_from_false_positive(0.01, 0.99, 0.05)

# Posterior P(A|B)
posterior = bayes_posterior(0.01, 0.99, evidence)
```

#### Birthday Problem

```python
from src.utils.birthday_problem import (
    collision_prob_uniform,
    collision_prob_nonuniform,
    min_group_for_prob,
    expected_duplicate_pairs,
)

# P(duplicate) for 23 people in a pool of 365.25
prob = collision_prob_uniform(23, 365.25)

# Minimum group size to reach 50% collision probability
n = min_group_for_prob(0.50, 365.25)

# P(duplicate) with a non-uniform pool
prob_nu = collision_prob_nonuniform(30, [0.10, 0.15, 0.20, 0.30, 0.25])

# Expected number of duplicate pairs
pairs = expected_duplicate_pairs(23, 365.25)
```

#### Normal Distribution

```python
from src.utils.normal_gaussian import (
    normal_pdf,
    normal_cdf,
    normal_ppf,
    normal_prob_between,
)

# PDF value at x=1.96 for the standard normal
pdf = normal_pdf(1.96, mu=0.0, sigma=1.0)

# P(X ≤ 1.96)
cdf = normal_cdf(1.96, mu=0.0, sigma=1.0)

# P(X ≥ 1.96)
survival = 1.0 - normal_cdf(1.96, mu=0.0, sigma=1.0)

# P(−1.96 ≤ X ≤ 1.96)
prob = normal_prob_between(-1.96, 1.96, mu=0.0, sigma=1.0)

# Find x such that P(X ≤ x) = 0.975 (inverse CDF)
x = normal_ppf(0.975, mu=0.0, sigma=1.0)
```

#### Expected Value

```python
from src.utils.expected_value import (
    expected_value,
    variance,
    std_dev,
    entropy,
    mgf,
    load_file,
)

outcomes = [0, 1, 5, 10]
probs    = [0.50, 0.25, 0.15, 0.10]

# E[X]
ev = expected_value(outcomes, probs)

# Var(X) and SD(X)
var = variance(outcomes, probs)
sd  = std_dev(outcomes, probs)

# Shannon entropy (bits)
H = entropy(probs)

# Moment generating function M_X(t) at t=0.5
M = mgf(outcomes, probs, t=0.5)

# Load a distribution from a CSV or JSON file
outcomes, probs = load_file("payouts.csv")
```

#### Poisson Distribution

```python
from src.utils.poisson_distribution import (
    poisson_pmf,
    poisson_cdf_le,
    poisson_cdf_ge,
    min_k_for_prob,
)

# P(X = 7) for Poisson(λ=3.0)
pmf = poisson_pmf(7, 3.0)

# P(X ≤ 7) for Poisson(λ=3.0)
cdf = poisson_cdf_le(7, 3.0)

# P(X ≥ 7) for Poisson(λ=3.0)
survival = poisson_cdf_ge(7, 3.0)

# Minimum k such that P(X ≤ k) >= 0.95
k = min_k_for_prob(0.95, 3.0)
```

#### Streak Probability

```python
from src.utils.streak_probability import (
    prob_at_least_one_streak,
    expected_longest_streak,
)

# P(at least one run of 5 consecutive heads in 100 fair coin flips)
p = prob_at_least_one_streak(100, 5, 0.5)

# Expected length of the longest run of successes in 162 trials at .300
e = expected_longest_streak(162, 0.300)
```

#### Pythagorean Record

```python
from src.utils.pythagorean_record import (
    pythagorean_expectation,
    linear_expectation,
    expected_wins,
)

# Traditional Pythagorean formula: expected win % for 800 RS, 650 RA
win_pct = pythagorean_expectation(800, 650, exponent=2.0)

# Linear formula (SABR 2014): expected win % for MLB
win_pct = linear_expectation(800, 650, sport="mlb")

# Linear formula for NFL
win_pct = linear_expectation(420, 300, sport="nfl")

# Linear formula for NBA
win_pct = linear_expectation(8500, 8200, sport="nba")

# Convert win percentage to expected wins
wins = expected_wins(win_pct, games=162)
```

#### Pearson Correlation

```python
from src.utils.pearson_correlation import (
    pearson_r,
    correlation_t_statistic,
    correlation_p_value,
    correlation_confidence_interval,
)

x = [1, 2, 3, 4, 5]
y = [2.1, 3.8, 6.2, 7.9, 10.1]

# Compute Pearson's r
r = pearson_r(x, y)

# r² (coefficient of determination)
r_squared = r * r

# t-statistic for testing H₀: ρ = 0
t_stat = correlation_t_statistic(r, n=len(x))

# p-value for two-tailed test
p_value = correlation_p_value(r, n=len(x), sided="two")

# 95% confidence interval for population correlation ρ
ci_lower, ci_upper = correlation_confidence_interval(r, n=len(x), alpha=0.05)
```

#### Monte Carlo Simulator

```python
from src.utils.monte_carlo import (
    simulate_binomial,
    simulate_birthday,
    simulate_streak,
    simulate_poisson,
    wilson_ci,
    standard_error,
)

# Simulate P(X >= 5) for Binomial(10, 0.4) over 100,000 trials
results = simulate_binomial(n=10, k=5, p=0.4, trials=100_000, seed=42)
p_hat = sum(results) / len(results)
se = standard_error(p_hat, len(results))
ci = wilson_ci(p_hat, len(results))
```

## Development

Clone the repository and install in editable mode:

```bash
git clone https://github.com/ncarsner/pythodds.git
cd pythodds
pip install -e .
```

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Author

[Nicholas Carsner](https://github.com/ncarsner)
