Metadata-Version: 2.4
Name: rebinning
Version: 0.1.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Rust
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Image Processing
Classifier: Topic :: Multimedia :: Graphics
Classifier: Operating System :: OS Independent
Classifier: License :: OSI Approved :: MIT License
Requires-Dist: numpy>=1.20
Requires-Dist: opencv-python-headless ; extra == 'test'
Requires-Dist: pytest ; extra == 'test'
Provides-Extra: test
License-File: LICENSE
Summary: Area-conserving rebinning of 2D data and images via Sutherland–Hodgman clipping.
Keywords: rebinning,resampling,image-processing,downscaling,area-weighted,polygon-clipping,sutherland-hodgman,numpy,rust,pyo3
Author: Dominique Dresen
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Changelog, https://github.com/DomiDre/rebinning/releases
Project-URL: Homepage, https://github.com/DomiDre/rebinning
Project-URL: Issues, https://github.com/DomiDre/rebinning/issues
Project-URL: Repository, https://github.com/DomiDre/rebinning

# rebinning

Area-conserving rebinning of 2D data. Implemented in Rust, exposed to Python through [PyO3](https://pyo3.rs/) and [maturin](https://www.maturin.rs/).

Every input cell is treated as a (possibly transformed) quadrilateral. That quadrilateral is geometrically clipped against each overlapping cell of a regular output grid, and the input value is redistributed (or averaged) across the output bins in proportion to overlap area. Totals are conserved exactly and there are no interpolation kernels involved.

For image resizing (axis-aligned scaling) it is also a fast alternative to OpenCV's `cv2.INTER_AREA`.

![Algorithm overview](docs/figures/overview.svg)

---

## Installation

You need a recent Rust toolchain (the build script uses [maturin](https://www.maturin.rs/) automatically). With [uv](https://docs.astral.sh/uv/):

```bash
uv sync
uv run maturin develop --uv --release
```

`uv sync` sets up `.venv/` and installs the Python source as an editable package; `maturin develop` compiles the Rust extension and drops the `_rebinning.*.so` into `python/rebinning/`. Re-run `maturin develop --uv --release` whenever you edit Rust.

## Quick start

### Resize an image

```python
import numpy as np
from PIL import Image
import rebinning

img = np.asarray(Image.open("portrait.jpg"))          # (H, W, 3) uint8
face = rebinning.rebin_image(img, 112, 112,
                             keep_aspect=True,
                             crop="center")
Image.fromarray(face).save("portrait_112.png")
```

`rebin_image` handles both grayscale `(H, W)` and multichannel `(H, W, C)` inputs, preserves the dtype, and clips/rounds integer outputs. With `keep_aspect=True` it crops the input to the target aspect ratio before resampling (top-left or center).

![Resizing a portrait to 112×112 with center crop](docs/figures/portrait_resize.png)

Reproduce with [`examples/resize_portrait.py`](examples/resize_portrait.py). The bundled `portrait.jpg` is a public-domain US Navy photo of Grace Hopper (taken from matplotlib's sample data); swap in any other image to try your own.

### Rebin a 2D field through a coordinate transform

```python
import numpy as np
import rebinning

# 100×100 regular input grid carrying some values
nx, ny = 100, 100
input_x = np.arange(nx) + 0.5
input_y = np.arange(ny) + 0.5
values = np.random.default_rng(0).normal(size=(nx, ny))

# rotate every input cell by 30 degrees
import math
c, s = math.cos(math.radians(30)), math.sin(math.radians(30))
def rotate(points):
    x, y = points[..., 0], points[..., 1]
    return np.stack([c*x - s*y, s*x + c*y], axis=-1)

output_x = np.arange(-50, 150) + 0.5
output_y = np.arange(-50, 150) + 0.5

out = rebinning.rebin(
    input_x, input_y, values,
    bin_width_x=1.0, bin_width_y=1.0,
    output_x=output_x, output_y=output_y,
    transform=rotate,
    mode="sum",   # 'sum' conserves totals; 'mean' gives an area-weighted mean
)
```

The transform is a vectorised numpy callable. It is invoked once with an `(Nx, Ny, 4, 2)` array of corner positions and must return an array of the same shape. Affine maps, spherical projections, beam-line geometry, anything you can express with numpy works.

![Rebinning a 100×100 grid through a 30° rotation](docs/figures/rotation_example.png)

The dashed line in the output panel is the rotated boundary of the original input domain. Reproduce with [`examples/test_readme_rebin.py`](examples/test_readme_rebin.py).

---

## How the algorithm works

### Step 1: Build the input quadrilaterals

The input is a regular grid of values. Each cell is a small rectangle around `(input_x[i], input_y[j])`. The library computes the four corners and (optionally) passes them through the user-supplied transform. The result is one quadrilateral per input cell, in output coordinates.

### Step 2: Clip against each output bin

For each input quad we know its bounding box in output coordinates, so we know which output bins it might overlap. For each of those candidate bins (a small axis-aligned rectangle), we run [Sutherland–Hodgman](https://en.wikipedia.org/wiki/Sutherland%E2%80%93Hodgman_algorithm) to clip the quad to the bin. Sutherland–Hodgman is just four passes of "drop everything outside one half-plane":

![Sutherland-Hodgman in four steps](docs/figures/clipping.svg)

The algorithm is correct whenever the clip polygon is convex, which a rectangle always is. The subject polygon (our input quad) can be any shape; only the four boundary tests need to know the rectangle's geometry.

The area of the resulting (possibly empty) polygon is computed with the [shoelace formula](https://en.wikipedia.org/wiki/Shoelace_formula): `½ |Σ (xᵢ yᵢ₊₁ − xᵢ₊₁ yᵢ)|`.

### Step 3: Redistribute by overlap area

For each input cell we now have a list of `(output_bin, overlap_area)` pairs. Let `A = Σ overlap_area`. In `mode="sum"` we add `value × overlap_area / A` to each touched output bin. In `mode="mean"` we accumulate the value weighted by overlap area and divide each output bin by the sum of weights it received.

![Redistribution by overlap area](docs/figures/redistribute.svg)

Every overlap area is shared between one input quad and one output bin, so no mass is lost or created. For `mode="sum"` the operation is provably area-conserving.

---

## Fast path for image resizing

When the transform is axis-aligned (image resizing, where you only scale and translate), every output bin is a rectangle covering a few input pixels. Each output pixel is the area-weighted mean of those input pixels, with weights set by the fractional overlap:

![One output bin and its input-pixel weights](docs/figures/output_pixel_zoom.svg)

The 2D overlap area further factors into a product of two 1D overlaps:

```
area(input_pixel ∩ output_bin) = Δx × Δy
```

This makes the operation separable: instead of clipping `Nx · Ny · Mx · My` quadrilateral/rectangle pairs, we build two small 1D weight matrices `W_y ∈ ℝ^{H_out × H_in}` and `W_x ∈ ℝ^{W_out × W_in}`, each row summing to 1, and apply them as two passes over the image:

```
out[i,j,c] = Σ_{a,b} W_y[i,a] · W_x[j,b] · img[a,b,c]
```

![Separable axis-aligned scaling](docs/figures/separable.svg)

Each row of `W_y` (and `W_x`) is non-zero only over the short contiguous range of input rows the corresponding output row covers, typically `ceil(scale) + 1` entries. `rebin_image` iterates that range per output row instead of doing a dense matmul over a mostly-zero matrix. The public [`rebin_axis_weights`](#api) helper still returns the dense weight matrix for inspection and custom processing.

---

## Comparison with `cv2.INTER_AREA`

For downscaling, `cv2.INTER_AREA` is mathematically equivalent to area-weighted rebinning. OpenCV uses a hand-tuned fixed-point routine; we apply the same 1D overlap weights separably, iterating only over the short contiguous range of input bins each output bin touches.

From `tests/test_rebin.py`:

* For random uint8 RGB inputs, the two agree to within 1 grey level maximum and <0.1 mean.
* For integer downscaling factors on float32, the two agree to 1e-4.

Single-threaded wall times for downscaling to 112×112, measured by [`benchmarks/bench_image.py`](benchmarks/bench_image.py) (best of 9 × 50 timed calls after warm-up, AMD Ryzen 9 7945HX):

| input size   | `rebin_image` u8 | `rebin_image` f32 | `cv2.INTER_AREA` u8 |
| ------------ | ---------------: | ----------------: | ------------------: |
| 480×640      |          0.32 ms |           0.27 ms |             0.74 ms |
| 1024×1024    |          0.66 ms |           0.54 ms |             2.32 ms |
| 2000×2000    |          1.92 ms |           3.05 ms |             8.49 ms |

Where the libraries differ:

|                                | `rebin_image`                              | `cv2.INTER_AREA`                              |
| ------------------------------ | ------------------------------------------ | --------------------------------------------- |
| Crop to aspect ratio           | built in (`keep_aspect=True`)              | requires a separate `cv2.resize` step         |
| Boundary handling              | partial overlap → fractional weight        | same, in fixed-point                          |
| Upscaling                      | falls back to nearest-neighbour            | not designed for upscaling                    |
| Algorithm exposure             | 1D weight matrix is a public API           | hidden inside the C implementation            |

Use `rebin_image` when you want the same numeric result everywhere (cross-platform parity, training-vs-serving consistency) without adding a second runtime dependency.

---

## API

```python
rebinning.rebin_image(
    image: np.ndarray,                 # (H, W) or (H, W, C)
    out_height: int,
    out_width: int,
    *,
    keep_aspect: bool = True,
    crop: Literal["center", "top-left"] = "center",
    out_dtype = None,
) -> np.ndarray
```

```python
rebinning.rebin(
    input_x: np.ndarray, input_y: np.ndarray, values: np.ndarray,  # (Nx,), (Ny,), (Nx, Ny)
    *,
    bin_width_x: float, bin_width_y: float,
    output_x: np.ndarray, output_y: np.ndarray,                    # uniformly-spaced
    transform: Callable[[np.ndarray], np.ndarray] | None = None,   # (..., 2) -> (..., 2)
    mode: Literal["sum", "mean"] = "sum",
    skip_partial: bool = False,
) -> np.ndarray
```

```python
rebinning.rebin_quads(
    quads: np.ndarray,                 # (N, 4, 2), pre-transformed corners
    values: np.ndarray,                # (N,)
    *,
    output_x, output_y,
    mode: Literal["sum", "mean"] = "sum",
    skip_partial: bool = False,
) -> np.ndarray
```

```python
rebinning.rebin_axis_weights(
    in_n: int, in_bw: float, in_origin: float,
    out_n: int, out_bw: float, out_origin: float,
) -> np.ndarray                        # (out_n, in_n), rows sum to 1
```

`skip_partial=True` reproduces the original Fortran reference's behaviour of dropping any input cell whose bounding box crosses the output-grid boundary. The default `False` clamps the iteration range instead, so partial-overlap cells still contribute their in-grid fraction.

---

## Crate layout

```
src/
├── lib.rs        # PyO3 module: type marshalling and shape checking
├── clip.rs       # Sutherland–Hodgman edge clip + shoelace area
├── rebin.rs      # The quadrilateral rebinning algorithm (the core)
├── weights.rs    # 1D overlap weights (sparse + dense) for the fast path
└── image.rs      # Separable u8/u16/f32/f64 image resizing using those weights

python/rebinning/
└── __init__.py   # Public Python API: rebin, rebin_image, rebin_quads, rebin_axis_weights
```

The Rust modules have their own `#[cfg(test)]` unit tests (clipping & weights); the end-to-end tests live in `tests/test_rebin.py` and include a comparison against `cv2.INTER_AREA` when OpenCV is installed.

## Development Tasks

```bash
# Set up the venv and build the extension:
uv sync
uv run maturin develop --uv --release

# Run the Python tests:
uv run python tests/test_rebin.py

# Run the Rust unit tests (clip + weights):
cargo test --lib

# Rebuild in place after editing Rust:
uv run maturin develop --uv --release
```

## Provenance

I initially wrote the algorithm for myself as a Fortran 90 routine for grazing-incidence X-ray
scattering data reduction. That code is preserved verbatim under
[`legacy/`](legacy/) for reference; the Rust port is a clean rewrite around
Sutherland–Hodgman clipping and is no longer specific to scattering geometry.

## License

Released under the MIT license.

