Metadata-Version: 2.4
Name: binjamin
Version: 0.1.0
Summary: Bin width estimation — all major methods in one place
Project-URL: Homepage, https://github.com/adelic-ai/binjamin
Author-email: Shun Richard Honda <shun.honda@adelic.org>
License: MIT
License-File: LICENSE
Keywords: bayesian-blocks,bin-width,binning,freedman-diaconis,histogram
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Python: >=3.12
Requires-Dist: numpy>=1.24
Description-Content-Type: text/markdown

# binjamin

**Bin width estimation — every major method in one place.**

```python
import binjamin as bj

# Estimate bin width
bj.auto(intervals)               # good default — max(FD, Sturges)
bj.freedman_diaconis(intervals)  # robust, no distributional assumption

# Bin data and get the artifact
edges, counts = bj.bin(data)
edges, counts = bj.bin(data, method='bayesian_blocks')
```

## Install

```bash
pip install binjamin
```

## Methods

### Scalar — return a single bin width

| Method | Formula | When to use |
|---|---|---|
| `auto` | max(FD, Sturges) | Good general default |
| `freedman_diaconis` | 2·IQR·n^(-1/3) | Unknown or skewed distribution, outliers present |
| `scott` | 3.5·σ·n^(-1/3) | Near-normal data, few outliers |
| `sturges` | (max−min)/(1+log₂n) | Small, near-normal datasets |
| `rice` | (max−min)/(2·n^(1/3)) | No assumption, simple alternative to FD |
| `sqrt` | (max−min)/√n | Quick exploratory work |
| `doane` | Sturges + skewness correction | Skewed or multimodal distributions |
| `stone` | Cross-validation | Unknown distribution, accuracy over speed |
| `knuth` | Maximum likelihood | Uniform bins, optimal posterior |
| `gcd_interval` | GCD of intervals | Integer sequences, regularly-sampled data |

### Variable-width — returns bin edges

| Method | When to use |
|---|---|
| `bayesian_blocks` | Non-stationary event data; density varies across the domain |

### Binning artifact

| Function | Returns |
|---|---|
| `bin(data, method='auto')` | `(edges, counts)` — edges and observation counts per bin |

## Usage

All scalar methods take a 1-D array-like and return a `float`:

```python
import numpy as np
import binjamin as bj

intervals = np.diff(np.sort(event_times))  # inter-event intervals

bj.freedman_diaconis(intervals)  # → float
bj.scott(intervals)              # → float
bj.knuth(intervals)              # → float
```

`bayesian_blocks` takes event positions (not intervals) and returns edges:

```python
edges = bj.bayesian_blocks(event_times, p0=0.05)
# edges: array of variable-width bin boundaries
# p0: false-positive rate for new change points (lower = fewer blocks)
```

`gcd_interval` takes integer-valued positions:

```python
bj.gcd_interval([0, 60, 120, 180, 300])  # → 60.0
```

## Choosing a method

**Start with `auto`.** If the result looks wrong:

- Heavy tails or outliers → `freedman_diaconis`
- Known near-normal distribution → `scott`
- Skewed data → `doane`
- Event rate changes over time → `bayesian_blocks`
- Integer sequence with known regular spacing → `gcd_interval`
- Need provably optimal bins, willing to wait → `knuth` or `stone`

## License

[MIT](LICENSE)
