Metadata-Version: 2.4
Name: simd-blend-modes
Version: 1.0.3
Summary: SIMD-accelerated blend modes
Author: Samuel Howard
License: MIT
Project-URL: Homepage, https://github.com/samhaswon/simd_blend_modes
Project-URL: Bug Tracker, https://github.com/samhaswon/simd_blend_modes/issues
Keywords: image,processing
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: C
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Dynamic: license-file

# SIMD Blend Modes

This project reimplements the blend modes from [`blend_modes`](https://github.com/flrs/blend_modes) with C kernels and SIMD
(SSE4.2/AVX2) acceleration. It supports uint8 and float32 NumPy inputs in the range 0..255
and returns output dtype/channel count matching the background image. Missing alpha channels
are treated as fully opaque (255). Opacity defaults to 1.0.

This is mostly intended to be a mostly drop-in replacement, but with a more permissive 
API that allows you to go faster if you don't need FP32 arrays or the information of an
Alpha channel for some layers.

## Build and Install

### General

```bash
pip install simd-blend-modes
```

### Development

```bash
pip install -r requirements-dev.txt
pip install -e .
```

## Usage

```python
import numpy as np
import simd_blend_modes as sbm

background = np.zeros((512, 512, 4), dtype=np.uint8)
foreground = np.zeros((512, 512, 4), dtype=np.uint8)

out = sbm.screen(background, foreground, 0.5)
```

Inputs:

- Dtypes: `np.uint8` or `np.float32` only.
- Value range: 0..255 for both dtypes.
  - This expects float32 inputs to be cast from uint8, not normalized as well.
- Shapes: `H x W x C` with `C` = 3 (RGB) or 4 (RGBA).
- Output: dtype and channel count match the background image.
- Alpha: if a source is RGB (3 channels), alpha is treated as 255 (fully opaque).
- Opacity: the third argument is optional; defaults to `1.0`.

Supported blend modes:

- [`normal`](https://en.wikipedia.org/wiki/Blend_modes#Normal_blend_mode)
- [`soft_light`](https://en.wikipedia.org/wiki/Blend_modes#Soft_Light)
- [`lighten_only`](https://en.wikipedia.org/wiki/Blend_modes#Lighten_Only)
- [`screen`](https://en.wikipedia.org/wiki/Blend_modes#Screen)
- [`dodge`](https://en.wikipedia.org/wiki/Blend_modes#Dodge_and_burn)
- [`addition`](https://en.wikipedia.org/wiki/Blend_modes#Addition)
- [`darken_only`](https://en.wikipedia.org/wiki/Blend_modes#Darken_Only)
- [`multiply`](https://en.wikipedia.org/wiki/Blend_modes#Multiply)
- [`hard_light`](https://en.wikipedia.org/wiki/Blend_modes#Hard_Light)
- [`difference`](https://en.wikipedia.org/wiki/Blend_modes#Difference)
- [`subtract`](https://en.wikipedia.org/wiki/Blend_modes#Subtract)
- `grain_extract` (known from GIMP)
- `grain_merge` (known from GIMP)
- [`divide`](https://en.wikipedia.org/wiki/Blend_modes#Divide)
- [`overlay`](https://en.wikipedia.org/wiki/Blend_modes#Overlay)

You can force a kernel by passing a string (or `KernelKind` value):

```python
out = sbm.screen(background, foreground, 0.5, "avx2")
```

## Tests

Correctness and performance:

```bash
python3 -m unittest discover tests/
```

Performance:

```bash
python3 -m unittest tests.test_performance
```

The performance test prints a markdown table of per-kernel speedups vs the NumPy reference
for common square sizes and screen resolutions.

## ARM

ARM isn't properly supported as I do not have a new enough ARM CPU to test on. 
Nor do I wish to use a cloud VM to test it. So, if you want ARM support, open a PR.
It should build and be faster, but there's no SIMD support there (yet).

ARM builds run in scalar-only mode (x86 SIMD is compile-time gated). To test ARM under Docker,
enable emulation and then build with the ARM platform. 

If you don't already have buildx/binfmt configured, run:

```bash
docker run --privileged --rm tonistiigi/binfmt --install arm64
```

Then build or run the ARM container:

```bash
docker compose up --build
```

This is incredibly slow. I wouldn't actually do this, but it's here.

## Notes

- SIMD kernels are selected at runtime: AVX2 → SSE4.2 → scalar.
- ARM builds are supported in scalar-only mode; x86 SIMD is compile-time gated. CI does not emit
  ARM artifacts.
- Reference tests adapted from the original project live in `tests/reference_blend_modes_tests.py`
  and are skipped unless the `blend_modes` package and test assets are available.
- The SIMD paths currently assume contiguous arrays (the input validation enforces this).

## Performance 

<!--
The performance test prints large tables. If your terminal buffer is limited, you can write the
output into this README instead by setting `WRITE_RESULTS_TO_README = True` in
`tests/test_performance.py`. When enabled, it replaces the block between the markers below.
-->

<!-- PERF_RESULTS_START -->
| Mode          | Kernel | Ref (s)  | Kernel (s) | Speedup | Percent Change |
| ------------- | ------ | -------- | ---------- | ------- | -------------- |
| normal        | scalar | 0.167029 | 0.033657   | 4.96x   | -79.85%        |
| normal        | sse42  | 0.167029 | 0.011444   | 14.60x  | -93.15%        |
| normal        | avx2   | 0.167029 | 0.010915   | 15.30x  | -93.47%        |
| soft_light    | scalar | 0.227108 | 0.040065   | 5.67x   | -82.36%        |
| soft_light    | sse42  | 0.227108 | 0.012021   | 18.89x  | -94.71%        |
| soft_light    | avx2   | 0.227108 | 0.011137   | 20.39x  | -95.10%        |
| lighten_only  | scalar | 0.168433 | 0.043472   | 3.87x   | -74.19%        |
| lighten_only  | sse42  | 0.168433 | 0.011929   | 14.12x  | -92.92%        |
| lighten_only  | avx2   | 0.168433 | 0.010955   | 15.38x  | -93.50%        |
| screen        | scalar | 0.179914 | 0.038705   | 4.65x   | -78.49%        |
| screen        | sse42  | 0.179914 | 0.012018   | 14.97x  | -93.32%        |
| screen        | avx2   | 0.179914 | 0.011221   | 16.03x  | -93.76%        |
| dodge         | scalar | 0.182316 | 0.041261   | 4.42x   | -77.37%        |
| dodge         | sse42  | 0.182316 | 0.012758   | 14.29x  | -93.00%        |
| dodge         | avx2   | 0.182316 | 0.011420   | 15.96x  | -93.74%        |
| addition      | scalar | 0.174717 | 0.061780   | 2.83x   | -64.64%        |
| addition      | sse42  | 0.174717 | 0.012799   | 13.65x  | -92.67%        |
| addition      | avx2   | 0.174717 | 0.011310   | 15.45x  | -93.53%        |
| darken_only   | scalar | 0.172792 | 0.043941   | 3.93x   | -74.57%        |
| darken_only   | sse42  | 0.172792 | 0.011946   | 14.46x  | -93.09%        |
| darken_only   | avx2   | 0.172792 | 0.011097   | 15.57x  | -93.58%        |
| multiply      | scalar | 0.174347 | 0.039088   | 4.46x   | -77.58%        |
| multiply      | sse42  | 0.174347 | 0.011828   | 14.74x  | -93.22%        |
| multiply      | avx2   | 0.174347 | 0.010915   | 15.97x  | -93.74%        |
| hard_light    | scalar | 0.255683 | 0.076160   | 3.36x   | -70.21%        |
| hard_light    | sse42  | 0.255683 | 0.012848   | 19.90x  | -94.98%        |
| hard_light    | avx2   | 0.255683 | 0.011286   | 22.65x  | -95.59%        |
| difference    | scalar | 0.230410 | 0.038175   | 6.04x   | -83.43%        |
| difference    | sse42  | 0.230410 | 0.011963   | 19.26x  | -94.81%        |
| difference    | avx2   | 0.230410 | 0.010998   | 20.95x  | -95.23%        |
| subtract      | scalar | 0.172605 | 0.039590   | 4.36x   | -77.06%        |
| subtract      | sse42  | 0.172605 | 0.012698   | 13.59x  | -92.64%        |
| subtract      | avx2   | 0.172605 | 0.011318   | 15.25x  | -93.44%        |
| grain_extract | scalar | 0.177783 | 0.051094   | 3.48x   | -71.26%        |
| grain_extract | sse42  | 0.177783 | 0.012071   | 14.73x  | -93.21%        |
| grain_extract | avx2   | 0.177783 | 0.010952   | 16.23x  | -93.84%        |
| grain_merge   | scalar | 0.178960 | 0.050727   | 3.53x   | -71.65%        |
| grain_merge   | sse42  | 0.178960 | 0.012008   | 14.90x  | -93.29%        |
| grain_merge   | avx2   | 0.178960 | 0.011038   | 16.21x  | -93.83%        |
| divide        | scalar | 0.181181 | 0.040103   | 4.52x   | -77.87%        |
| divide        | sse42  | 0.181181 | 0.012229   | 14.82x  | -93.25%        |
| divide        | avx2   | 0.181181 | 0.011248   | 16.11x  | -93.79%        |
| overlay       | scalar | 0.237504 | 0.072933   | 3.26x   | -69.29%        |
| overlay       | sse42  | 0.237504 | 0.012377   | 19.19x  | -94.79%        |
| overlay       | avx2   | 0.237504 | 0.011132   | 21.33x  | -95.31%        |

<details>
<summary>Per-kernel, size, and type results</summary>

| Case      | Input   | Channels | Opacity | Mode          | Kernel | Ref (s)  | Kernel (s) | Speedup | Percent Change |
| --------- | ------- | -------- | ------- | ------------- | ------ | -------- | ---------- | ------- | -------------- |
| 256x256   | uint8   | 3        | 0.50    | normal        | scalar | 0.007060 | 0.001647   | 4.29x   | -76.67%        |
| 256x256   | uint8   | 3        | 0.50    | normal        | sse42  | 0.007060 | 0.000707   | 9.98x   | -89.98%        |
| 256x256   | uint8   | 3        | 0.50    | normal        | avx2   | 0.007060 | 0.000709   | 9.96x   | -89.96%        |
| 256x256   | uint8   | 3        | 0.50    | soft_light    | scalar | 0.008574 | 0.001827   | 4.69x   | -78.69%        |
| 256x256   | uint8   | 3        | 0.50    | soft_light    | sse42  | 0.008574 | 0.000833   | 10.30x  | -90.29%        |
| 256x256   | uint8   | 3        | 0.50    | soft_light    | avx2   | 0.008574 | 0.000738   | 11.62x  | -91.39%        |
| 256x256   | uint8   | 3        | 0.50    | lighten_only  | scalar | 0.007031 | 0.001946   | 3.61x   | -72.33%        |
| 256x256   | uint8   | 3        | 0.50    | lighten_only  | sse42  | 0.007031 | 0.000790   | 8.90x   | -88.76%        |
| 256x256   | uint8   | 3        | 0.50    | lighten_only  | avx2   | 0.007031 | 0.000704   | 9.99x   | -89.99%        |
| 256x256   | uint8   | 3        | 0.50    | screen        | scalar | 0.007561 | 0.001829   | 4.13x   | -75.81%        |
| 256x256   | uint8   | 3        | 0.50    | screen        | sse42  | 0.007561 | 0.000814   | 9.29x   | -89.23%        |
| 256x256   | uint8   | 3        | 0.50    | screen        | avx2   | 0.007561 | 0.000739   | 10.23x  | -90.22%        |
| 256x256   | uint8   | 3        | 0.50    | dodge         | scalar | 0.007170 | 0.001871   | 3.83x   | -73.91%        |
| 256x256   | uint8   | 3        | 0.50    | dodge         | sse42  | 0.007170 | 0.000812   | 8.83x   | -88.67%        |
| 256x256   | uint8   | 3        | 0.50    | dodge         | avx2   | 0.007170 | 0.000732   | 9.79x   | -89.79%        |
| 256x256   | uint8   | 3        | 0.50    | addition      | scalar | 0.007163 | 0.002554   | 2.80x   | -64.35%        |
| 256x256   | uint8   | 3        | 0.50    | addition      | sse42  | 0.007163 | 0.000820   | 8.74x   | -88.56%        |
| 256x256   | uint8   | 3        | 0.50    | addition      | avx2   | 0.007163 | 0.000731   | 9.79x   | -89.79%        |
| 256x256   | uint8   | 3        | 0.50    | darken_only   | scalar | 0.007034 | 0.001954   | 3.60x   | -72.22%        |
| 256x256   | uint8   | 3        | 0.50    | darken_only   | sse42  | 0.007034 | 0.000771   | 9.12x   | -89.03%        |
| 256x256   | uint8   | 3        | 0.50    | darken_only   | avx2   | 0.007034 | 0.000691   | 10.18x  | -90.18%        |
| 256x256   | uint8   | 3        | 0.50    | multiply      | scalar | 0.006933 | 0.001771   | 3.91x   | -74.46%        |
| 256x256   | uint8   | 3        | 0.50    | multiply      | sse42  | 0.006933 | 0.000770   | 9.01x   | -88.90%        |
| 256x256   | uint8   | 3        | 0.50    | multiply      | avx2   | 0.006933 | 0.000690   | 10.05x  | -90.05%        |
| 256x256   | uint8   | 3        | 0.50    | hard_light    | scalar | 0.008719 | 0.003057   | 2.85x   | -64.94%        |
| 256x256   | uint8   | 3        | 0.50    | hard_light    | sse42  | 0.008719 | 0.000834   | 10.46x  | -90.44%        |
| 256x256   | uint8   | 3        | 0.50    | hard_light    | avx2   | 0.008719 | 0.000727   | 11.99x  | -91.66%        |
| 256x256   | uint8   | 3        | 0.50    | difference    | scalar | 0.008743 | 0.001775   | 4.92x   | -79.70%        |
| 256x256   | uint8   | 3        | 0.50    | difference    | sse42  | 0.008743 | 0.000801   | 10.91x  | -90.84%        |
| 256x256   | uint8   | 3        | 0.50    | difference    | avx2   | 0.008743 | 0.000719   | 12.16x  | -91.78%        |
| 256x256   | uint8   | 3        | 0.50    | subtract      | scalar | 0.007122 | 0.001672   | 4.26x   | -76.52%        |
| 256x256   | uint8   | 3        | 0.50    | subtract      | sse42  | 0.007122 | 0.000816   | 8.73x   | -88.54%        |
| 256x256   | uint8   | 3        | 0.50    | subtract      | avx2   | 0.007122 | 0.000699   | 10.19x  | -90.19%        |
| 256x256   | uint8   | 3        | 0.50    | grain_extract | scalar | 0.006934 | 0.002214   | 3.13x   | -68.07%        |
| 256x256   | uint8   | 3        | 0.50    | grain_extract | sse42  | 0.006934 | 0.000794   | 8.74x   | -88.55%        |
| 256x256   | uint8   | 3        | 0.50    | grain_extract | avx2   | 0.006934 | 0.000696   | 9.96x   | -89.96%        |
| 256x256   | uint8   | 3        | 0.50    | grain_merge   | scalar | 0.006903 | 0.002156   | 3.20x   | -68.76%        |
| 256x256   | uint8   | 3        | 0.50    | grain_merge   | sse42  | 0.006903 | 0.000813   | 8.49x   | -88.22%        |
| 256x256   | uint8   | 3        | 0.50    | grain_merge   | avx2   | 0.006903 | 0.000754   | 9.16x   | -89.08%        |
| 256x256   | uint8   | 3        | 0.50    | divide        | scalar | 0.007122 | 0.001819   | 3.92x   | -74.46%        |
| 256x256   | uint8   | 3        | 0.50    | divide        | sse42  | 0.007122 | 0.000836   | 8.51x   | -88.25%        |
| 256x256   | uint8   | 3        | 0.50    | divide        | avx2   | 0.007122 | 0.000758   | 9.40x   | -89.36%        |
| 256x256   | uint8   | 3        | 0.50    | overlay       | scalar | 0.008676 | 0.002970   | 2.92x   | -65.77%        |
| 256x256   | uint8   | 3        | 0.50    | overlay       | sse42  | 0.008676 | 0.000793   | 10.94x  | -90.86%        |
| 256x256   | uint8   | 3        | 0.50    | overlay       | avx2   | 0.008676 | 0.000706   | 12.29x  | -91.86%        |
| 256x256   | uint8   | 4        | 0.50    | normal        | scalar | 0.003088 | 0.001293   | 2.39x   | -58.13%        |
| 256x256   | uint8   | 4        | 0.50    | normal        | sse42  | 0.003088 | 0.000182   | 17.01x  | -94.12%        |
| 256x256   | uint8   | 4        | 0.50    | normal        | avx2   | 0.003088 | 0.000162   | 19.11x  | -94.77%        |
| 256x256   | uint8   | 4        | 0.50    | soft_light    | scalar | 0.006625 | 0.001629   | 4.07x   | -75.42%        |
| 256x256   | uint8   | 4        | 0.50    | soft_light    | sse42  | 0.006625 | 0.000221   | 29.92x  | -96.66%        |
| 256x256   | uint8   | 4        | 0.50    | soft_light    | avx2   | 0.006625 | 0.000207   | 32.03x  | -96.88%        |
| 256x256   | uint8   | 4        | 0.50    | lighten_only  | scalar | 0.005348 | 0.001731   | 3.09x   | -67.63%        |
| 256x256   | uint8   | 4        | 0.50    | lighten_only  | sse42  | 0.005348 | 0.000195   | 27.45x  | -96.36%        |
| 256x256   | uint8   | 4        | 0.50    | lighten_only  | avx2   | 0.005348 | 0.000186   | 28.72x  | -96.52%        |
| 256x256   | uint8   | 4        | 0.50    | screen        | scalar | 0.005297 | 0.001556   | 3.40x   | -70.62%        |
| 256x256   | uint8   | 4        | 0.50    | screen        | sse42  | 0.005297 | 0.000218   | 24.26x  | -95.88%        |
| 256x256   | uint8   | 4        | 0.50    | screen        | avx2   | 0.005297 | 0.000193   | 27.46x  | -96.36%        |
| 256x256   | uint8   | 4        | 0.50    | dodge         | scalar | 0.005452 | 0.001668   | 3.27x   | -69.40%        |
| 256x256   | uint8   | 4        | 0.50    | dodge         | sse42  | 0.005452 | 0.000248   | 22.01x  | -95.46%        |
| 256x256   | uint8   | 4        | 0.50    | dodge         | avx2   | 0.005452 | 0.000206   | 26.49x  | -96.23%        |
| 256x256   | uint8   | 4        | 0.50    | addition      | scalar | 0.005437 | 0.001983   | 2.74x   | -63.53%        |
| 256x256   | uint8   | 4        | 0.50    | addition      | sse42  | 0.005437 | 0.000265   | 20.53x  | -95.13%        |
| 256x256   | uint8   | 4        | 0.50    | addition      | avx2   | 0.005437 | 0.000199   | 27.30x  | -96.34%        |
| 256x256   | uint8   | 4        | 0.50    | darken_only   | scalar | 0.005319 | 0.001718   | 3.10x   | -67.71%        |
| 256x256   | uint8   | 4        | 0.50    | darken_only   | sse42  | 0.005319 | 0.000199   | 26.78x  | -96.27%        |
| 256x256   | uint8   | 4        | 0.50    | darken_only   | avx2   | 0.005319 | 0.000187   | 28.39x  | -96.48%        |
| 256x256   | uint8   | 4        | 0.50    | multiply      | scalar | 0.005352 | 0.001621   | 3.30x   | -69.70%        |
| 256x256   | uint8   | 4        | 0.50    | multiply      | sse42  | 0.005352 | 0.000212   | 25.27x  | -96.04%        |
| 256x256   | uint8   | 4        | 0.50    | multiply      | avx2   | 0.005352 | 0.000191   | 27.97x  | -96.42%        |
| 256x256   | uint8   | 4        | 0.50    | hard_light    | scalar | 0.007153 | 0.002625   | 2.72x   | -63.30%        |
| 256x256   | uint8   | 4        | 0.50    | hard_light    | sse42  | 0.007153 | 0.000242   | 29.58x  | -96.62%        |
| 256x256   | uint8   | 4        | 0.50    | hard_light    | avx2   | 0.007153 | 0.000199   | 36.03x  | -97.22%        |
| 256x256   | uint8   | 4        | 0.50    | difference    | scalar | 0.007306 | 0.001604   | 4.55x   | -78.04%        |
| 256x256   | uint8   | 4        | 0.50    | difference    | sse42  | 0.007306 | 0.000205   | 35.70x  | -97.20%        |
| 256x256   | uint8   | 4        | 0.50    | difference    | avx2   | 0.007306 | 0.000191   | 38.29x  | -97.39%        |
| 256x256   | uint8   | 4        | 0.50    | subtract      | scalar | 0.005437 | 0.001492   | 3.64x   | -72.56%        |
| 256x256   | uint8   | 4        | 0.50    | subtract      | sse42  | 0.005437 | 0.000267   | 20.35x  | -95.08%        |
| 256x256   | uint8   | 4        | 0.50    | subtract      | avx2   | 0.005437 | 0.000227   | 23.97x  | -95.83%        |
| 256x256   | uint8   | 4        | 0.50    | grain_extract | scalar | 0.005462 | 0.001929   | 2.83x   | -64.68%        |
| 256x256   | uint8   | 4        | 0.50    | grain_extract | sse42  | 0.005462 | 0.000212   | 25.73x  | -96.11%        |
| 256x256   | uint8   | 4        | 0.50    | grain_extract | avx2   | 0.005462 | 0.000205   | 26.69x  | -96.25%        |
| 256x256   | uint8   | 4        | 0.50    | grain_merge   | scalar | 0.005275 | 0.001919   | 2.75x   | -63.63%        |
| 256x256   | uint8   | 4        | 0.50    | grain_merge   | sse42  | 0.005275 | 0.000218   | 24.22x  | -95.87%        |
| 256x256   | uint8   | 4        | 0.50    | grain_merge   | avx2   | 0.005275 | 0.000201   | 26.22x  | -96.19%        |
| 256x256   | uint8   | 4        | 0.50    | divide        | scalar | 0.005521 | 0.001627   | 3.39x   | -70.54%        |
| 256x256   | uint8   | 4        | 0.50    | divide        | sse42  | 0.005521 | 0.000221   | 24.99x  | -96.00%        |
| 256x256   | uint8   | 4        | 0.50    | divide        | avx2   | 0.005521 | 0.000201   | 27.50x  | -96.36%        |
| 256x256   | uint8   | 4        | 0.50    | overlay       | scalar | 0.006683 | 0.002572   | 2.60x   | -61.51%        |
| 256x256   | uint8   | 4        | 0.50    | overlay       | sse42  | 0.006683 | 0.000227   | 29.46x  | -96.61%        |
| 256x256   | uint8   | 4        | 0.50    | overlay       | avx2   | 0.006683 | 0.000205   | 32.67x  | -96.94%        |
| 256x256   | float32 | 3        | 0.50    | normal        | scalar | 0.005614 | 0.000506   | 11.10x  | -90.99%        |
| 256x256   | float32 | 3        | 0.50    | normal        | sse42  | 0.005614 | 0.000225   | 25.01x  | -96.00%        |
| 256x256   | float32 | 3        | 0.50    | normal        | avx2   | 0.005614 | 0.000144   | 38.94x  | -97.43%        |
| 256x256   | float32 | 3        | 0.50    | soft_light    | scalar | 0.008346 | 0.000616   | 13.55x  | -92.62%        |
| 256x256   | float32 | 3        | 0.50    | soft_light    | sse42  | 0.008346 | 0.000127   | 65.64x  | -98.48%        |
| 256x256   | float32 | 3        | 0.50    | soft_light    | avx2   | 0.008346 | 0.000078   | 107.67x | -99.07%        |
| 256x256   | float32 | 3        | 0.50    | lighten_only  | scalar | 0.006958 | 0.000748   | 9.30x   | -89.25%        |
| 256x256   | float32 | 3        | 0.50    | lighten_only  | sse42  | 0.006958 | 0.000107   | 64.94x  | -98.46%        |
| 256x256   | float32 | 3        | 0.50    | lighten_only  | avx2   | 0.006958 | 0.000067   | 104.50x | -99.04%        |
| 256x256   | float32 | 3        | 0.50    | screen        | scalar | 0.007108 | 0.000657   | 10.82x  | -90.75%        |
| 256x256   | float32 | 3        | 0.50    | screen        | sse42  | 0.007108 | 0.000115   | 61.63x  | -98.38%        |
| 256x256   | float32 | 3        | 0.50    | screen        | avx2   | 0.007108 | 0.000069   | 103.07x | -99.03%        |
| 256x256   | float32 | 3        | 0.50    | dodge         | scalar | 0.006983 | 0.000635   | 11.00x  | -90.90%        |
| 256x256   | float32 | 3        | 0.50    | dodge         | sse42  | 0.006983 | 0.000126   | 55.61x  | -98.20%        |
| 256x256   | float32 | 3        | 0.50    | dodge         | avx2   | 0.006983 | 0.000079   | 87.97x  | -98.86%        |
| 256x256   | float32 | 3        | 0.50    | addition      | scalar | 0.007158 | 0.001560   | 4.59x   | -78.20%        |
| 256x256   | float32 | 3        | 0.50    | addition      | sse42  | 0.007158 | 0.000110   | 65.25x  | -98.47%        |
| 256x256   | float32 | 3        | 0.50    | addition      | avx2   | 0.007158 | 0.000078   | 91.57x  | -98.91%        |
| 256x256   | float32 | 3        | 0.50    | darken_only   | scalar | 0.006830 | 0.000772   | 8.84x   | -88.69%        |
| 256x256   | float32 | 3        | 0.50    | darken_only   | sse42  | 0.006830 | 0.000112   | 60.80x  | -98.36%        |
| 256x256   | float32 | 3        | 0.50    | darken_only   | avx2   | 0.006830 | 0.000065   | 104.68x | -99.04%        |
| 256x256   | float32 | 3        | 0.50    | multiply      | scalar | 0.006988 | 0.000569   | 12.29x  | -91.86%        |
| 256x256   | float32 | 3        | 0.50    | multiply      | sse42  | 0.006988 | 0.000109   | 64.39x  | -98.45%        |
| 256x256   | float32 | 3        | 0.50    | multiply      | avx2   | 0.006988 | 0.000068   | 102.93x | -99.03%        |
| 256x256   | float32 | 3        | 0.50    | hard_light    | scalar | 0.008962 | 0.001796   | 4.99x   | -79.97%        |
| 256x256   | float32 | 3        | 0.50    | hard_light    | sse42  | 0.008962 | 0.000134   | 66.84x  | -98.50%        |
| 256x256   | float32 | 3        | 0.50    | hard_light    | avx2   | 0.008962 | 0.000074   | 121.92x | -99.18%        |
| 256x256   | float32 | 3        | 0.50    | difference    | scalar | 0.008783 | 0.000583   | 15.06x  | -93.36%        |
| 256x256   | float32 | 3        | 0.50    | difference    | sse42  | 0.008783 | 0.000181   | 48.62x  | -97.94%        |
| 256x256   | float32 | 3        | 0.50    | difference    | avx2   | 0.008783 | 0.000067   | 130.85x | -99.24%        |
| 256x256   | float32 | 3        | 0.50    | subtract      | scalar | 0.007209 | 0.000675   | 10.68x  | -90.64%        |
| 256x256   | float32 | 3        | 0.50    | subtract      | sse42  | 0.007209 | 0.000113   | 63.60x  | -98.43%        |
| 256x256   | float32 | 3        | 0.50    | subtract      | avx2   | 0.007209 | 0.000068   | 106.31x | -99.06%        |
| 256x256   | float32 | 3        | 0.50    | grain_extract | scalar | 0.007080 | 0.001008   | 7.02x   | -85.76%        |
| 256x256   | float32 | 3        | 0.50    | grain_extract | sse42  | 0.007080 | 0.000120   | 58.96x  | -98.30%        |
| 256x256   | float32 | 3        | 0.50    | grain_extract | avx2   | 0.007080 | 0.000076   | 93.38x  | -98.93%        |
| 256x256   | float32 | 3        | 0.50    | grain_merge   | scalar | 0.007030 | 0.001011   | 6.95x   | -85.62%        |
| 256x256   | float32 | 3        | 0.50    | grain_merge   | sse42  | 0.007030 | 0.000130   | 54.09x  | -98.15%        |
| 256x256   | float32 | 3        | 0.50    | grain_merge   | avx2   | 0.007030 | 0.000066   | 105.80x | -99.05%        |
| 256x256   | float32 | 3        | 0.50    | divide        | scalar | 0.007192 | 0.000630   | 11.41x  | -91.24%        |
| 256x256   | float32 | 3        | 0.50    | divide        | sse42  | 0.007192 | 0.000161   | 44.69x  | -97.76%        |
| 256x256   | float32 | 3        | 0.50    | divide        | avx2   | 0.007192 | 0.000073   | 98.93x  | -98.99%        |
| 256x256   | float32 | 3        | 0.50    | overlay       | scalar | 0.008223 | 0.001632   | 5.04x   | -80.16%        |
| 256x256   | float32 | 3        | 0.50    | overlay       | sse42  | 0.008223 | 0.000124   | 66.30x  | -98.49%        |
| 256x256   | float32 | 3        | 0.50    | overlay       | avx2   | 0.008223 | 0.000070   | 118.17x | -99.15%        |
| 256x256   | float32 | 4        | 0.50    | normal        | scalar | 0.004337 | 0.000616   | 7.04x   | -85.79%        |
| 256x256   | float32 | 4        | 0.50    | normal        | sse42  | 0.004337 | 0.000136   | 31.80x  | -96.86%        |
| 256x256   | float32 | 4        | 0.50    | normal        | avx2   | 0.004337 | 0.000147   | 29.53x  | -96.61%        |
| 256x256   | float32 | 4        | 0.50    | soft_light    | scalar | 0.006553 | 0.000705   | 9.30x   | -89.24%        |
| 256x256   | float32 | 4        | 0.50    | soft_light    | sse42  | 0.006553 | 0.000179   | 36.70x  | -97.27%        |
| 256x256   | float32 | 4        | 0.50    | soft_light    | avx2   | 0.006553 | 0.000175   | 37.39x  | -97.33%        |
| 256x256   | float32 | 4        | 0.50    | lighten_only  | scalar | 0.005270 | 0.000780   | 6.76x   | -85.21%        |
| 256x256   | float32 | 4        | 0.50    | lighten_only  | sse42  | 0.005270 | 0.000162   | 32.49x  | -96.92%        |
| 256x256   | float32 | 4        | 0.50    | lighten_only  | avx2   | 0.005270 | 0.000180   | 29.34x  | -96.59%        |
| 256x256   | float32 | 4        | 0.50    | screen        | scalar | 0.005236 | 0.000669   | 7.83x   | -87.23%        |
| 256x256   | float32 | 4        | 0.50    | screen        | sse42  | 0.005236 | 0.000179   | 29.25x  | -96.58%        |
| 256x256   | float32 | 4        | 0.50    | screen        | avx2   | 0.005236 | 0.000182   | 28.84x  | -96.53%        |
| 256x256   | float32 | 4        | 0.50    | dodge         | scalar | 0.005545 | 0.000833   | 6.65x   | -84.97%        |
| 256x256   | float32 | 4        | 0.50    | dodge         | sse42  | 0.005545 | 0.000242   | 22.91x  | -95.64%        |
| 256x256   | float32 | 4        | 0.50    | dodge         | avx2   | 0.005545 | 0.000189   | 29.32x  | -96.59%        |
| 256x256   | float32 | 4        | 0.50    | addition      | scalar | 0.006072 | 0.001358   | 4.47x   | -77.64%        |
| 256x256   | float32 | 4        | 0.50    | addition      | sse42  | 0.006072 | 0.000189   | 32.15x  | -96.89%        |
| 256x256   | float32 | 4        | 0.50    | addition      | avx2   | 0.006072 | 0.000193   | 31.54x  | -96.83%        |
| 256x256   | float32 | 4        | 0.50    | darken_only   | scalar | 0.005460 | 0.000911   | 6.00x   | -83.32%        |
| 256x256   | float32 | 4        | 0.50    | darken_only   | sse42  | 0.005460 | 0.000174   | 31.43x  | -96.82%        |
| 256x256   | float32 | 4        | 0.50    | darken_only   | avx2   | 0.005460 | 0.000187   | 29.15x  | -96.57%        |
| 256x256   | float32 | 4        | 0.50    | multiply      | scalar | 0.005718 | 0.000650   | 8.80x   | -88.63%        |
| 256x256   | float32 | 4        | 0.50    | multiply      | sse42  | 0.005718 | 0.000172   | 33.16x  | -96.98%        |
| 256x256   | float32 | 4        | 0.50    | multiply      | avx2   | 0.005718 | 0.000194   | 29.48x  | -96.61%        |
| 256x256   | float32 | 4        | 0.50    | hard_light    | scalar | 0.007159 | 0.001851   | 3.87x   | -74.14%        |
| 256x256   | float32 | 4        | 0.50    | hard_light    | sse42  | 0.007159 | 0.000225   | 31.80x  | -96.85%        |
| 256x256   | float32 | 4        | 0.50    | hard_light    | avx2   | 0.007159 | 0.000188   | 38.03x  | -97.37%        |
| 256x256   | float32 | 4        | 0.50    | difference    | scalar | 0.007116 | 0.000657   | 10.83x  | -90.77%        |
| 256x256   | float32 | 4        | 0.50    | difference    | sse42  | 0.007116 | 0.000163   | 43.73x  | -97.71%        |
| 256x256   | float32 | 4        | 0.50    | difference    | avx2   | 0.007116 | 0.000196   | 36.31x  | -97.25%        |
| 256x256   | float32 | 4        | 0.50    | subtract      | scalar | 0.005387 | 0.000843   | 6.39x   | -84.35%        |
| 256x256   | float32 | 4        | 0.50    | subtract      | sse42  | 0.005387 | 0.000187   | 28.74x  | -96.52%        |
| 256x256   | float32 | 4        | 0.50    | subtract      | avx2   | 0.005387 | 0.000188   | 28.66x  | -96.51%        |
| 256x256   | float32 | 4        | 0.50    | grain_extract | scalar | 0.005355 | 0.001089   | 4.92x   | -79.66%        |
| 256x256   | float32 | 4        | 0.50    | grain_extract | sse42  | 0.005355 | 0.000175   | 30.69x  | -96.74%        |
| 256x256   | float32 | 4        | 0.50    | grain_extract | avx2   | 0.005355 | 0.000180   | 29.67x  | -96.63%        |
| 256x256   | float32 | 4        | 0.50    | grain_merge   | scalar | 0.005238 | 0.001067   | 4.91x   | -79.62%        |
| 256x256   | float32 | 4        | 0.50    | grain_merge   | sse42  | 0.005238 | 0.000168   | 31.14x  | -96.79%        |
| 256x256   | float32 | 4        | 0.50    | grain_merge   | avx2   | 0.005238 | 0.000174   | 30.08x  | -96.68%        |
| 256x256   | float32 | 4        | 0.50    | divide        | scalar | 0.005559 | 0.000801   | 6.94x   | -85.58%        |
| 256x256   | float32 | 4        | 0.50    | divide        | sse42  | 0.005559 | 0.000180   | 30.88x  | -96.76%        |
| 256x256   | float32 | 4        | 0.50    | divide        | avx2   | 0.005559 | 0.000182   | 30.54x  | -96.73%        |
| 256x256   | float32 | 4        | 0.50    | overlay       | scalar | 0.006694 | 0.001743   | 3.84x   | -73.97%        |
| 256x256   | float32 | 4        | 0.50    | overlay       | sse42  | 0.006694 | 0.000191   | 35.01x  | -97.14%        |
| 256x256   | float32 | 4        | 0.50    | overlay       | avx2   | 0.006694 | 0.000180   | 37.11x  | -97.31%        |
| 512x512   | uint8   | 3        | 0.50    | normal        | scalar | 0.032529 | 0.006406   | 5.08x   | -80.31%        |
| 512x512   | uint8   | 3        | 0.50    | normal        | sse42  | 0.032529 | 0.002739   | 11.87x  | -91.58%        |
| 512x512   | uint8   | 3        | 0.50    | normal        | avx2   | 0.032529 | 0.002812   | 11.57x  | -91.36%        |
| 512x512   | uint8   | 3        | 0.00    | normal        | scalar | 0.032365 | 0.002500   | 12.95x  | -92.28%        |
| 512x512   | uint8   | 3        | 0.00    | normal        | sse42  | 0.032365 | 0.002463   | 13.14x  | -92.39%        |
| 512x512   | uint8   | 3        | 0.00    | normal        | avx2   | 0.032365 | 0.002465   | 13.13x  | -92.39%        |
| 512x512   | uint8   | 3        | 1.00    | normal        | scalar | 0.031105 | 0.002495   | 12.47x  | -91.98%        |
| 512x512   | uint8   | 3        | 1.00    | normal        | sse42  | 0.031105 | 0.002602   | 11.96x  | -91.64%        |
| 512x512   | uint8   | 3        | 1.00    | normal        | avx2   | 0.031105 | 0.002519   | 12.35x  | -91.90%        |
| 512x512   | uint8   | 3        | 0.50    | soft_light    | scalar | 0.049049 | 0.007300   | 6.72x   | -85.12%        |
| 512x512   | uint8   | 3        | 0.50    | soft_light    | sse42  | 0.049049 | 0.003426   | 14.32x  | -93.01%        |
| 512x512   | uint8   | 3        | 0.50    | soft_light    | avx2   | 0.049049 | 0.002923   | 16.78x  | -94.04%        |
| 512x512   | uint8   | 3        | 0.00    | soft_light    | scalar | 0.044721 | 0.002523   | 17.72x  | -94.36%        |
| 512x512   | uint8   | 3        | 0.00    | soft_light    | sse42  | 0.044721 | 0.002661   | 16.81x  | -94.05%        |
| 512x512   | uint8   | 3        | 0.00    | soft_light    | avx2   | 0.044721 | 0.002501   | 17.88x  | -94.41%        |
| 512x512   | uint8   | 3        | 1.00    | soft_light    | scalar | 0.042222 | 0.007468   | 5.65x   | -82.31%        |
| 512x512   | uint8   | 3        | 1.00    | soft_light    | sse42  | 0.042222 | 0.003163   | 13.35x  | -92.51%        |
| 512x512   | uint8   | 3        | 1.00    | soft_light    | avx2   | 0.042222 | 0.002817   | 14.99x  | -93.33%        |
| 512x512   | uint8   | 3        | 0.50    | lighten_only  | scalar | 0.037859 | 0.007898   | 4.79x   | -79.14%        |
| 512x512   | uint8   | 3        | 0.50    | lighten_only  | sse42  | 0.037859 | 0.003189   | 11.87x  | -91.58%        |
| 512x512   | uint8   | 3        | 0.50    | lighten_only  | avx2   | 0.037859 | 0.002969   | 12.75x  | -92.16%        |
| 512x512   | uint8   | 3        | 0.00    | lighten_only  | scalar | 0.043547 | 0.002659   | 16.38x  | -93.89%        |
| 512x512   | uint8   | 3        | 0.00    | lighten_only  | sse42  | 0.043547 | 0.002652   | 16.42x  | -93.91%        |
| 512x512   | uint8   | 3        | 0.00    | lighten_only  | avx2   | 0.043547 | 0.002530   | 17.21x  | -94.19%        |
| 512x512   | uint8   | 3        | 1.00    | lighten_only  | scalar | 0.035719 | 0.007904   | 4.52x   | -77.87%        |
| 512x512   | uint8   | 3        | 1.00    | lighten_only  | sse42  | 0.035719 | 0.003054   | 11.70x  | -91.45%        |
| 512x512   | uint8   | 3        | 1.00    | lighten_only  | avx2   | 0.035719 | 0.002744   | 13.02x  | -92.32%        |
| 512x512   | uint8   | 3        | 0.50    | screen        | scalar | 0.041262 | 0.007197   | 5.73x   | -82.56%        |
| 512x512   | uint8   | 3        | 0.50    | screen        | sse42  | 0.041262 | 0.003170   | 13.02x  | -92.32%        |
| 512x512   | uint8   | 3        | 0.50    | screen        | avx2   | 0.041262 | 0.002898   | 14.24x  | -92.98%        |
| 512x512   | uint8   | 3        | 0.00    | screen        | scalar | 0.041981 | 0.002535   | 16.56x  | -93.96%        |
| 512x512   | uint8   | 3        | 0.00    | screen        | sse42  | 0.041981 | 0.002532   | 16.58x  | -93.97%        |
| 512x512   | uint8   | 3        | 0.00    | screen        | avx2   | 0.041981 | 0.002885   | 14.55x  | -93.13%        |
| 512x512   | uint8   | 3        | 1.00    | screen        | scalar | 0.036188 | 0.007125   | 5.08x   | -80.31%        |
| 512x512   | uint8   | 3        | 1.00    | screen        | sse42  | 0.036188 | 0.003290   | 11.00x  | -90.91%        |
| 512x512   | uint8   | 3        | 1.00    | screen        | avx2   | 0.036188 | 0.002892   | 12.51x  | -92.01%        |
| 512x512   | uint8   | 3        | 0.50    | dodge         | scalar | 0.039442 | 0.007667   | 5.14x   | -80.56%        |
| 512x512   | uint8   | 3        | 0.50    | dodge         | sse42  | 0.039442 | 0.003526   | 11.18x  | -91.06%        |
| 512x512   | uint8   | 3        | 0.50    | dodge         | avx2   | 0.039442 | 0.003565   | 11.06x  | -90.96%        |
| 512x512   | uint8   | 3        | 0.00    | dodge         | scalar | 0.037772 | 0.002517   | 15.01x  | -93.34%        |
| 512x512   | uint8   | 3        | 0.00    | dodge         | sse42  | 0.037772 | 0.002493   | 15.15x  | -93.40%        |
| 512x512   | uint8   | 3        | 0.00    | dodge         | avx2   | 0.037772 | 0.002480   | 15.23x  | -93.43%        |
| 512x512   | uint8   | 3        | 1.00    | dodge         | scalar | 0.036383 | 0.007272   | 5.00x   | -80.01%        |
| 512x512   | uint8   | 3        | 1.00    | dodge         | sse42  | 0.036383 | 0.003223   | 11.29x  | -91.14%        |
| 512x512   | uint8   | 3        | 1.00    | dodge         | avx2   | 0.036383 | 0.002854   | 12.75x  | -92.15%        |
| 512x512   | uint8   | 3        | 0.50    | addition      | scalar | 0.036252 | 0.010036   | 3.61x   | -72.32%        |
| 512x512   | uint8   | 3        | 0.50    | addition      | sse42  | 0.036252 | 0.003142   | 11.54x  | -91.33%        |
| 512x512   | uint8   | 3        | 0.50    | addition      | avx2   | 0.036252 | 0.002881   | 12.58x  | -92.05%        |
| 512x512   | uint8   | 3        | 0.00    | addition      | scalar | 0.038296 | 0.002557   | 14.98x  | -93.32%        |
| 512x512   | uint8   | 3        | 0.00    | addition      | sse42  | 0.038296 | 0.002573   | 14.89x  | -93.28%        |
| 512x512   | uint8   | 3        | 0.00    | addition      | avx2   | 0.038296 | 0.002731   | 14.02x  | -92.87%        |
| 512x512   | uint8   | 3        | 1.00    | addition      | scalar | 0.037970 | 0.013665   | 2.78x   | -64.01%        |
| 512x512   | uint8   | 3        | 1.00    | addition      | sse42  | 0.037970 | 0.003161   | 12.01x  | -91.68%        |
| 512x512   | uint8   | 3        | 1.00    | addition      | avx2   | 0.037970 | 0.002805   | 13.54x  | -92.61%        |
| 512x512   | uint8   | 3        | 0.50    | darken_only   | scalar | 0.038589 | 0.007867   | 4.91x   | -79.61%        |
| 512x512   | uint8   | 3        | 0.50    | darken_only   | sse42  | 0.038589 | 0.003126   | 12.34x  | -91.90%        |
| 512x512   | uint8   | 3        | 0.50    | darken_only   | avx2   | 0.038589 | 0.002807   | 13.75x  | -92.73%        |
| 512x512   | uint8   | 3        | 0.00    | darken_only   | scalar | 0.036788 | 0.002476   | 14.86x  | -93.27%        |
| 512x512   | uint8   | 3        | 0.00    | darken_only   | sse42  | 0.036788 | 0.002491   | 14.77x  | -93.23%        |
| 512x512   | uint8   | 3        | 0.00    | darken_only   | avx2   | 0.036788 | 0.002502   | 14.71x  | -93.20%        |
| 512x512   | uint8   | 3        | 1.00    | darken_only   | scalar | 0.040669 | 0.008091   | 5.03x   | -80.11%        |
| 512x512   | uint8   | 3        | 1.00    | darken_only   | sse42  | 0.040669 | 0.003341   | 12.17x  | -91.78%        |
| 512x512   | uint8   | 3        | 1.00    | darken_only   | avx2   | 0.040669 | 0.002901   | 14.02x  | -92.87%        |
| 512x512   | uint8   | 3        | 0.50    | multiply      | scalar | 0.039158 | 0.007244   | 5.41x   | -81.50%        |
| 512x512   | uint8   | 3        | 0.50    | multiply      | sse42  | 0.039158 | 0.003220   | 12.16x  | -91.78%        |
| 512x512   | uint8   | 3        | 0.50    | multiply      | avx2   | 0.039158 | 0.002846   | 13.76x  | -92.73%        |
| 512x512   | uint8   | 3        | 0.00    | multiply      | scalar | 0.037589 | 0.002511   | 14.97x  | -93.32%        |
| 512x512   | uint8   | 3        | 0.00    | multiply      | sse42  | 0.037589 | 0.002518   | 14.93x  | -93.30%        |
| 512x512   | uint8   | 3        | 0.00    | multiply      | avx2   | 0.037589 | 0.002487   | 15.11x  | -93.38%        |
| 512x512   | uint8   | 3        | 1.00    | multiply      | scalar | 0.036258 | 0.007306   | 4.96x   | -79.85%        |
| 512x512   | uint8   | 3        | 1.00    | multiply      | sse42  | 0.036258 | 0.003086   | 11.75x  | -91.49%        |
| 512x512   | uint8   | 3        | 1.00    | multiply      | avx2   | 0.036258 | 0.002786   | 13.01x  | -92.32%        |
| 512x512   | uint8   | 3        | 0.50    | hard_light    | scalar | 0.046371 | 0.012216   | 3.80x   | -73.66%        |
| 512x512   | uint8   | 3        | 0.50    | hard_light    | sse42  | 0.046371 | 0.003281   | 14.13x  | -92.92%        |
| 512x512   | uint8   | 3        | 0.50    | hard_light    | avx2   | 0.046371 | 0.002917   | 15.90x  | -93.71%        |
| 512x512   | uint8   | 3        | 0.00    | hard_light    | scalar | 0.048398 | 0.002487   | 19.46x  | -94.86%        |
| 512x512   | uint8   | 3        | 0.00    | hard_light    | sse42  | 0.048398 | 0.002474   | 19.56x  | -94.89%        |
| 512x512   | uint8   | 3        | 0.00    | hard_light    | avx2   | 0.048398 | 0.002482   | 19.50x  | -94.87%        |
| 512x512   | uint8   | 3        | 1.00    | hard_light    | scalar | 0.044786 | 0.012263   | 3.65x   | -72.62%        |
| 512x512   | uint8   | 3        | 1.00    | hard_light    | sse42  | 0.044786 | 0.003228   | 13.88x  | -92.79%        |
| 512x512   | uint8   | 3        | 1.00    | hard_light    | avx2   | 0.044786 | 0.002860   | 15.66x  | -93.61%        |
| 512x512   | uint8   | 3        | 0.50    | difference    | scalar | 0.043277 | 0.006947   | 6.23x   | -83.95%        |
| 512x512   | uint8   | 3        | 0.50    | difference    | sse42  | 0.043277 | 0.003069   | 14.10x  | -92.91%        |
| 512x512   | uint8   | 3        | 0.50    | difference    | avx2   | 0.043277 | 0.002823   | 15.33x  | -93.48%        |
| 512x512   | uint8   | 3        | 0.00    | difference    | scalar | 0.043158 | 0.002487   | 17.35x  | -94.24%        |
| 512x512   | uint8   | 3        | 0.00    | difference    | sse42  | 0.043158 | 0.002572   | 16.78x  | -94.04%        |
| 512x512   | uint8   | 3        | 0.00    | difference    | avx2   | 0.043158 | 0.002478   | 17.42x  | -94.26%        |
| 512x512   | uint8   | 3        | 1.00    | difference    | scalar | 0.044465 | 0.007097   | 6.27x   | -84.04%        |
| 512x512   | uint8   | 3        | 1.00    | difference    | sse42  | 0.044465 | 0.003064   | 14.51x  | -93.11%        |
| 512x512   | uint8   | 3        | 1.00    | difference    | avx2   | 0.044465 | 0.002764   | 16.09x  | -93.78%        |
| 512x512   | uint8   | 3        | 0.50    | subtract      | scalar | 0.035946 | 0.006777   | 5.30x   | -81.15%        |
| 512x512   | uint8   | 3        | 0.50    | subtract      | sse42  | 0.035946 | 0.003167   | 11.35x  | -91.19%        |
| 512x512   | uint8   | 3        | 0.50    | subtract      | avx2   | 0.035946 | 0.002782   | 12.92x  | -92.26%        |
| 512x512   | uint8   | 3        | 0.00    | subtract      | scalar | 0.036983 | 0.002518   | 14.69x  | -93.19%        |
| 512x512   | uint8   | 3        | 0.00    | subtract      | sse42  | 0.036983 | 0.002463   | 15.01x  | -93.34%        |
| 512x512   | uint8   | 3        | 0.00    | subtract      | avx2   | 0.036983 | 0.002484   | 14.89x  | -93.28%        |
| 512x512   | uint8   | 3        | 1.00    | subtract      | scalar | 0.036562 | 0.006835   | 5.35x   | -81.31%        |
| 512x512   | uint8   | 3        | 1.00    | subtract      | sse42  | 0.036562 | 0.003133   | 11.67x  | -91.43%        |
| 512x512   | uint8   | 3        | 1.00    | subtract      | avx2   | 0.036562 | 0.002789   | 13.11x  | -92.37%        |
| 512x512   | uint8   | 3        | 0.50    | grain_extract | scalar | 0.036522 | 0.008680   | 4.21x   | -76.23%        |
| 512x512   | uint8   | 3        | 0.50    | grain_extract | sse42  | 0.036522 | 0.003140   | 11.63x  | -91.40%        |
| 512x512   | uint8   | 3        | 0.50    | grain_extract | avx2   | 0.036522 | 0.002845   | 12.84x  | -92.21%        |
| 512x512   | uint8   | 3        | 0.00    | grain_extract | scalar | 0.036388 | 0.002515   | 14.47x  | -93.09%        |
| 512x512   | uint8   | 3        | 0.00    | grain_extract | sse42  | 0.036388 | 0.002517   | 14.46x  | -93.08%        |
| 512x512   | uint8   | 3        | 0.00    | grain_extract | avx2   | 0.036388 | 0.002611   | 13.94x  | -92.83%        |
| 512x512   | uint8   | 3        | 1.00    | grain_extract | scalar | 0.036183 | 0.008655   | 4.18x   | -76.08%        |
| 512x512   | uint8   | 3        | 1.00    | grain_extract | sse42  | 0.036183 | 0.003286   | 11.01x  | -90.92%        |
| 512x512   | uint8   | 3        | 1.00    | grain_extract | avx2   | 0.036183 | 0.002869   | 12.61x  | -92.07%        |
| 512x512   | uint8   | 3        | 0.50    | grain_merge   | scalar | 0.036480 | 0.008743   | 4.17x   | -76.03%        |
| 512x512   | uint8   | 3        | 0.50    | grain_merge   | sse42  | 0.036480 | 0.003159   | 11.55x  | -91.34%        |
| 512x512   | uint8   | 3        | 0.50    | grain_merge   | avx2   | 0.036480 | 0.002815   | 12.96x  | -92.28%        |
| 512x512   | uint8   | 3        | 0.00    | grain_merge   | scalar | 0.036196 | 0.002545   | 14.22x  | -92.97%        |
| 512x512   | uint8   | 3        | 0.00    | grain_merge   | sse42  | 0.036196 | 0.002515   | 14.39x  | -93.05%        |
| 512x512   | uint8   | 3        | 0.00    | grain_merge   | avx2   | 0.036196 | 0.002473   | 14.64x  | -93.17%        |
| 512x512   | uint8   | 3        | 1.00    | grain_merge   | scalar | 0.036108 | 0.008733   | 4.13x   | -75.81%        |
| 512x512   | uint8   | 3        | 1.00    | grain_merge   | sse42  | 0.036108 | 0.003146   | 11.48x  | -91.29%        |
| 512x512   | uint8   | 3        | 1.00    | grain_merge   | avx2   | 0.036108 | 0.002781   | 12.99x  | -92.30%        |
| 512x512   | uint8   | 3        | 0.50    | divide        | scalar | 0.036733 | 0.007486   | 4.91x   | -79.62%        |
| 512x512   | uint8   | 3        | 0.50    | divide        | sse42  | 0.036733 | 0.003182   | 11.55x  | -91.34%        |
| 512x512   | uint8   | 3        | 0.50    | divide        | avx2   | 0.036733 | 0.002818   | 13.03x  | -92.33%        |
| 512x512   | uint8   | 3        | 0.00    | divide        | scalar | 0.037447 | 0.002502   | 14.97x  | -93.32%        |
| 512x512   | uint8   | 3        | 0.00    | divide        | sse42  | 0.037447 | 0.002483   | 15.08x  | -93.37%        |
| 512x512   | uint8   | 3        | 0.00    | divide        | avx2   | 0.037447 | 0.002493   | 15.02x  | -93.34%        |
| 512x512   | uint8   | 3        | 1.00    | divide        | scalar | 0.037044 | 0.007217   | 5.13x   | -80.52%        |
| 512x512   | uint8   | 3        | 1.00    | divide        | sse42  | 0.037044 | 0.003158   | 11.73x  | -91.48%        |
| 512x512   | uint8   | 3        | 1.00    | divide        | avx2   | 0.037044 | 0.002807   | 13.20x  | -92.42%        |
| 512x512   | uint8   | 3        | 0.50    | overlay       | scalar | 0.042984 | 0.011756   | 3.66x   | -72.65%        |
| 512x512   | uint8   | 3        | 0.50    | overlay       | sse42  | 0.042984 | 0.003192   | 13.47x  | -92.57%        |
| 512x512   | uint8   | 3        | 0.50    | overlay       | avx2   | 0.042984 | 0.002828   | 15.20x  | -93.42%        |
| 512x512   | uint8   | 3        | 0.00    | overlay       | scalar | 0.043079 | 0.002489   | 17.30x  | -94.22%        |
| 512x512   | uint8   | 3        | 0.00    | overlay       | sse42  | 0.043079 | 0.002481   | 17.36x  | -94.24%        |
| 512x512   | uint8   | 3        | 0.00    | overlay       | avx2   | 0.043079 | 0.002479   | 17.38x  | -94.25%        |
| 512x512   | uint8   | 3        | 1.00    | overlay       | scalar | 0.044043 | 0.011972   | 3.68x   | -72.82%        |
| 512x512   | uint8   | 3        | 1.00    | overlay       | sse42  | 0.044043 | 0.003373   | 13.06x  | -92.34%        |
| 512x512   | uint8   | 3        | 1.00    | overlay       | avx2   | 0.044043 | 0.002933   | 15.02x  | -93.34%        |
| 512x512   | uint8   | 4        | 0.50    | normal        | scalar | 0.023745 | 0.005202   | 4.56x   | -78.09%        |
| 512x512   | uint8   | 4        | 0.50    | normal        | sse42  | 0.023745 | 0.000695   | 34.16x  | -97.07%        |
| 512x512   | uint8   | 4        | 0.50    | normal        | avx2   | 0.023745 | 0.000626   | 37.95x  | -97.37%        |
| 512x512   | uint8   | 4        | 0.00    | normal        | scalar | 0.023453 | 0.000049   | 476.37x | -99.79%        |
| 512x512   | uint8   | 4        | 0.00    | normal        | sse42  | 0.023453 | 0.000052   | 450.93x | -99.78%        |
| 512x512   | uint8   | 4        | 0.00    | normal        | avx2   | 0.023453 | 0.000045   | 516.78x | -99.81%        |
| 512x512   | uint8   | 4        | 1.00    | normal        | scalar | 0.023454 | 0.005247   | 4.47x   | -77.63%        |
| 512x512   | uint8   | 4        | 1.00    | normal        | sse42  | 0.023454 | 0.000697   | 33.67x  | -97.03%        |
| 512x512   | uint8   | 4        | 1.00    | normal        | avx2   | 0.023454 | 0.000630   | 37.22x  | -97.31%        |
| 512x512   | uint8   | 4        | 0.50    | soft_light    | scalar | 0.034040 | 0.006527   | 5.21x   | -80.82%        |
| 512x512   | uint8   | 4        | 0.50    | soft_light    | sse42  | 0.034040 | 0.000888   | 38.31x  | -97.39%        |
| 512x512   | uint8   | 4        | 0.50    | soft_light    | avx2   | 0.034040 | 0.000825   | 41.28x  | -97.58%        |
| 512x512   | uint8   | 4        | 0.00    | soft_light    | scalar | 0.033842 | 0.000045   | 748.05x | -99.87%        |
| 512x512   | uint8   | 4        | 0.00    | soft_light    | sse42  | 0.033842 | 0.000044   | 767.73x | -99.87%        |
| 512x512   | uint8   | 4        | 0.00    | soft_light    | avx2   | 0.033842 | 0.000044   | 768.29x | -99.87%        |
| 512x512   | uint8   | 4        | 1.00    | soft_light    | scalar | 0.034055 | 0.006618   | 5.15x   | -80.57%        |
| 512x512   | uint8   | 4        | 1.00    | soft_light    | sse42  | 0.034055 | 0.000887   | 38.41x  | -97.40%        |
| 512x512   | uint8   | 4        | 1.00    | soft_light    | avx2   | 0.034055 | 0.000824   | 41.34x  | -97.58%        |
| 512x512   | uint8   | 4        | 0.50    | lighten_only  | scalar | 0.027618 | 0.006816   | 4.05x   | -75.32%        |
| 512x512   | uint8   | 4        | 0.50    | lighten_only  | sse42  | 0.027618 | 0.000772   | 35.77x  | -97.20%        |
| 512x512   | uint8   | 4        | 0.50    | lighten_only  | avx2   | 0.027618 | 0.000739   | 37.38x  | -97.32%        |
| 512x512   | uint8   | 4        | 0.00    | lighten_only  | scalar | 0.026958 | 0.000053   | 505.80x | -99.80%        |
| 512x512   | uint8   | 4        | 0.00    | lighten_only  | sse42  | 0.026958 | 0.000052   | 516.31x | -99.81%        |
| 512x512   | uint8   | 4        | 0.00    | lighten_only  | avx2   | 0.026958 | 0.000056   | 484.14x | -99.79%        |
| 512x512   | uint8   | 4        | 1.00    | lighten_only  | scalar | 0.030627 | 0.007331   | 4.18x   | -76.06%        |
| 512x512   | uint8   | 4        | 1.00    | lighten_only  | sse42  | 0.030627 | 0.000850   | 36.04x  | -97.23%        |
| 512x512   | uint8   | 4        | 1.00    | lighten_only  | avx2   | 0.030627 | 0.000783   | 39.13x  | -97.44%        |
| 512x512   | uint8   | 4        | 0.50    | screen        | scalar | 0.035865 | 0.006361   | 5.64x   | -82.26%        |
| 512x512   | uint8   | 4        | 0.50    | screen        | sse42  | 0.035865 | 0.000909   | 39.46x  | -97.47%        |
| 512x512   | uint8   | 4        | 0.50    | screen        | avx2   | 0.035865 | 0.000775   | 46.26x  | -97.84%        |
| 512x512   | uint8   | 4        | 0.00    | screen        | scalar | 0.028354 | 0.000046   | 621.58x | -99.84%        |
| 512x512   | uint8   | 4        | 0.00    | screen        | sse42  | 0.028354 | 0.000056   | 505.59x | -99.80%        |
| 512x512   | uint8   | 4        | 0.00    | screen        | avx2   | 0.028354 | 0.000045   | 632.38x | -99.84%        |
| 512x512   | uint8   | 4        | 1.00    | screen        | scalar | 0.027972 | 0.006233   | 4.49x   | -77.72%        |
| 512x512   | uint8   | 4        | 1.00    | screen        | sse42  | 0.027972 | 0.000841   | 33.27x  | -96.99%        |
| 512x512   | uint8   | 4        | 1.00    | screen        | avx2   | 0.027972 | 0.000787   | 35.52x  | -97.18%        |
| 512x512   | uint8   | 4        | 0.50    | dodge         | scalar | 0.027978 | 0.006564   | 4.26x   | -76.54%        |
| 512x512   | uint8   | 4        | 0.50    | dodge         | sse42  | 0.027978 | 0.000949   | 29.48x  | -96.61%        |
| 512x512   | uint8   | 4        | 0.50    | dodge         | avx2   | 0.027978 | 0.000828   | 33.77x  | -97.04%        |
| 512x512   | uint8   | 4        | 0.00    | dodge         | scalar | 0.028653 | 0.000055   | 521.45x | -99.81%        |
| 512x512   | uint8   | 4        | 0.00    | dodge         | sse42  | 0.028653 | 0.000055   | 525.70x | -99.81%        |
| 512x512   | uint8   | 4        | 0.00    | dodge         | avx2   | 0.028653 | 0.000066   | 431.93x | -99.77%        |
| 512x512   | uint8   | 4        | 1.00    | dodge         | scalar | 0.032164 | 0.006602   | 4.87x   | -79.47%        |
| 512x512   | uint8   | 4        | 1.00    | dodge         | sse42  | 0.032164 | 0.000944   | 34.07x  | -97.06%        |
| 512x512   | uint8   | 4        | 1.00    | dodge         | avx2   | 0.032164 | 0.000820   | 39.20x  | -97.45%        |
| 512x512   | uint8   | 4        | 0.50    | addition      | scalar | 0.027064 | 0.007971   | 3.40x   | -70.55%        |
| 512x512   | uint8   | 4        | 0.50    | addition      | sse42  | 0.027064 | 0.001047   | 25.85x  | -96.13%        |
| 512x512   | uint8   | 4        | 0.50    | addition      | avx2   | 0.027064 | 0.000778   | 34.77x  | -97.12%        |
| 512x512   | uint8   | 4        | 0.00    | addition      | scalar | 0.027140 | 0.000049   | 550.01x | -99.82%        |
| 512x512   | uint8   | 4        | 0.00    | addition      | sse42  | 0.027140 | 0.000047   | 583.17x | -99.83%        |
| 512x512   | uint8   | 4        | 0.00    | addition      | avx2   | 0.027140 | 0.000046   | 589.86x | -99.83%        |
| 512x512   | uint8   | 4        | 1.00    | addition      | scalar | 0.027049 | 0.009348   | 2.89x   | -65.44%        |
| 512x512   | uint8   | 4        | 1.00    | addition      | sse42  | 0.027049 | 0.001046   | 25.87x  | -96.13%        |
| 512x512   | uint8   | 4        | 1.00    | addition      | avx2   | 0.027049 | 0.000780   | 34.69x  | -97.12%        |
| 512x512   | uint8   | 4        | 0.50    | darken_only   | scalar | 0.027085 | 0.006858   | 3.95x   | -74.68%        |
| 512x512   | uint8   | 4        | 0.50    | darken_only   | sse42  | 0.027085 | 0.000794   | 34.10x  | -97.07%        |
| 512x512   | uint8   | 4        | 0.50    | darken_only   | avx2   | 0.027085 | 0.000784   | 34.55x  | -97.11%        |
| 512x512   | uint8   | 4        | 0.00    | darken_only   | scalar | 0.032720 | 0.000083   | 396.00x | -99.75%        |
| 512x512   | uint8   | 4        | 0.00    | darken_only   | sse42  | 0.032720 | 0.000062   | 531.23x | -99.81%        |
| 512x512   | uint8   | 4        | 0.00    | darken_only   | avx2   | 0.032720 | 0.000054   | 609.34x | -99.84%        |
| 512x512   | uint8   | 4        | 1.00    | darken_only   | scalar | 0.033020 | 0.007077   | 4.67x   | -78.57%        |
| 512x512   | uint8   | 4        | 1.00    | darken_only   | sse42  | 0.033020 | 0.000794   | 41.60x  | -97.60%        |
| 512x512   | uint8   | 4        | 1.00    | darken_only   | avx2   | 0.033020 | 0.000768   | 43.00x  | -97.67%        |
| 512x512   | uint8   | 4        | 0.50    | multiply      | scalar | 0.027894 | 0.006336   | 4.40x   | -77.28%        |
| 512x512   | uint8   | 4        | 0.50    | multiply      | sse42  | 0.027894 | 0.000853   | 32.69x  | -96.94%        |
| 512x512   | uint8   | 4        | 0.50    | multiply      | avx2   | 0.027894 | 0.000754   | 37.01x  | -97.30%        |
| 512x512   | uint8   | 4        | 0.00    | multiply      | scalar | 0.029920 | 0.000049   | 611.83x | -99.84%        |
| 512x512   | uint8   | 4        | 0.00    | multiply      | sse42  | 0.029920 | 0.000088   | 340.20x | -99.71%        |
| 512x512   | uint8   | 4        | 0.00    | multiply      | avx2   | 0.029920 | 0.000050   | 600.03x | -99.83%        |
| 512x512   | uint8   | 4        | 1.00    | multiply      | scalar | 0.030346 | 0.006455   | 4.70x   | -78.73%        |
| 512x512   | uint8   | 4        | 1.00    | multiply      | sse42  | 0.030346 | 0.000835   | 36.35x  | -97.25%        |
| 512x512   | uint8   | 4        | 1.00    | multiply      | avx2   | 0.030346 | 0.000756   | 40.14x  | -97.51%        |
| 512x512   | uint8   | 4        | 0.50    | hard_light    | scalar | 0.037962 | 0.010628   | 3.57x   | -72.00%        |
| 512x512   | uint8   | 4        | 0.50    | hard_light    | sse42  | 0.037962 | 0.000997   | 38.07x  | -97.37%        |
| 512x512   | uint8   | 4        | 0.50    | hard_light    | avx2   | 0.037962 | 0.000798   | 47.54x  | -97.90%        |
| 512x512   | uint8   | 4        | 0.00    | hard_light    | scalar | 0.036588 | 0.000048   | 767.70x | -99.87%        |
| 512x512   | uint8   | 4        | 0.00    | hard_light    | sse42  | 0.036588 | 0.000051   | 716.00x | -99.86%        |
| 512x512   | uint8   | 4        | 0.00    | hard_light    | avx2   | 0.036588 | 0.000045   | 809.16x | -99.88%        |
| 512x512   | uint8   | 4        | 1.00    | hard_light    | scalar | 0.036749 | 0.010341   | 3.55x   | -71.86%        |
| 512x512   | uint8   | 4        | 1.00    | hard_light    | sse42  | 0.036749 | 0.000950   | 38.67x  | -97.41%        |
| 512x512   | uint8   | 4        | 1.00    | hard_light    | avx2   | 0.036749 | 0.000783   | 46.94x  | -97.87%        |
| 512x512   | uint8   | 4        | 0.50    | difference    | scalar | 0.035225 | 0.006220   | 5.66x   | -82.34%        |
| 512x512   | uint8   | 4        | 0.50    | difference    | sse42  | 0.035225 | 0.000844   | 41.75x  | -97.61%        |
| 512x512   | uint8   | 4        | 0.50    | difference    | avx2   | 0.035225 | 0.000744   | 47.37x  | -97.89%        |
| 512x512   | uint8   | 4        | 0.00    | difference    | scalar | 0.037970 | 0.000044   | 863.58x | -99.88%        |
| 512x512   | uint8   | 4        | 0.00    | difference    | sse42  | 0.037970 | 0.000044   | 865.95x | -99.88%        |
| 512x512   | uint8   | 4        | 0.00    | difference    | avx2   | 0.037970 | 0.000073   | 518.39x | -99.81%        |
| 512x512   | uint8   | 4        | 1.00    | difference    | scalar | 0.035258 | 0.006492   | 5.43x   | -81.59%        |
| 512x512   | uint8   | 4        | 1.00    | difference    | sse42  | 0.035258 | 0.000841   | 41.92x  | -97.61%        |
| 512x512   | uint8   | 4        | 1.00    | difference    | avx2   | 0.035258 | 0.000758   | 46.54x  | -97.85%        |
| 512x512   | uint8   | 4        | 0.50    | subtract      | scalar | 0.033773 | 0.005954   | 5.67x   | -82.37%        |
| 512x512   | uint8   | 4        | 0.50    | subtract      | sse42  | 0.033773 | 0.001065   | 31.71x  | -96.85%        |
| 512x512   | uint8   | 4        | 0.50    | subtract      | avx2   | 0.033773 | 0.000875   | 38.58x  | -97.41%        |
| 512x512   | uint8   | 4        | 0.00    | subtract      | scalar | 0.027810 | 0.000059   | 469.30x | -99.79%        |
| 512x512   | uint8   | 4        | 0.00    | subtract      | sse42  | 0.027810 | 0.000059   | 472.44x | -99.79%        |
| 512x512   | uint8   | 4        | 0.00    | subtract      | avx2   | 0.027810 | 0.000084   | 331.23x | -99.70%        |
| 512x512   | uint8   | 4        | 1.00    | subtract      | scalar | 0.032943 | 0.006019   | 5.47x   | -81.73%        |
| 512x512   | uint8   | 4        | 1.00    | subtract      | sse42  | 0.032943 | 0.001047   | 31.46x  | -96.82%        |
| 512x512   | uint8   | 4        | 1.00    | subtract      | avx2   | 0.032943 | 0.000869   | 37.91x  | -97.36%        |
| 512x512   | uint8   | 4        | 0.50    | grain_extract | scalar | 0.028985 | 0.007856   | 3.69x   | -72.90%        |
| 512x512   | uint8   | 4        | 0.50    | grain_extract | sse42  | 0.028985 | 0.000861   | 33.65x  | -97.03%        |
| 512x512   | uint8   | 4        | 0.50    | grain_extract | avx2   | 0.028985 | 0.000815   | 35.55x  | -97.19%        |
| 512x512   | uint8   | 4        | 0.00    | grain_extract | scalar | 0.034620 | 0.000063   | 549.21x | -99.82%        |
| 512x512   | uint8   | 4        | 0.00    | grain_extract | sse42  | 0.034620 | 0.000046   | 748.99x | -99.87%        |
| 512x512   | uint8   | 4        | 0.00    | grain_extract | avx2   | 0.034620 | 0.000048   | 716.20x | -99.86%        |
| 512x512   | uint8   | 4        | 1.00    | grain_extract | scalar | 0.027931 | 0.007720   | 3.62x   | -72.36%        |
| 512x512   | uint8   | 4        | 1.00    | grain_extract | sse42  | 0.027931 | 0.000875   | 31.93x  | -96.87%        |
| 512x512   | uint8   | 4        | 1.00    | grain_extract | avx2   | 0.027931 | 0.000807   | 34.62x  | -97.11%        |
| 512x512   | uint8   | 4        | 0.50    | grain_merge   | scalar | 0.033088 | 0.007670   | 4.31x   | -76.82%        |
| 512x512   | uint8   | 4        | 0.50    | grain_merge   | sse42  | 0.033088 | 0.000845   | 39.15x  | -97.45%        |
| 512x512   | uint8   | 4        | 0.50    | grain_merge   | avx2   | 0.033088 | 0.000803   | 41.20x  | -97.57%        |
| 512x512   | uint8   | 4        | 0.00    | grain_merge   | scalar | 0.028199 | 0.000045   | 631.75x | -99.84%        |
| 512x512   | uint8   | 4        | 0.00    | grain_merge   | sse42  | 0.028199 | 0.000058   | 483.28x | -99.79%        |
| 512x512   | uint8   | 4        | 0.00    | grain_merge   | avx2   | 0.028199 | 0.000044   | 634.38x | -99.84%        |
| 512x512   | uint8   | 4        | 1.00    | grain_merge   | scalar | 0.027789 | 0.007571   | 3.67x   | -72.76%        |
| 512x512   | uint8   | 4        | 1.00    | grain_merge   | sse42  | 0.027789 | 0.000834   | 33.32x  | -97.00%        |
| 512x512   | uint8   | 4        | 1.00    | grain_merge   | avx2   | 0.027789 | 0.000790   | 35.16x  | -97.16%        |
| 512x512   | uint8   | 4        | 0.50    | divide        | scalar | 0.028336 | 0.006427   | 4.41x   | -77.32%        |
| 512x512   | uint8   | 4        | 0.50    | divide        | sse42  | 0.028336 | 0.000861   | 32.91x  | -96.96%        |
| 512x512   | uint8   | 4        | 0.50    | divide        | avx2   | 0.028336 | 0.000786   | 36.06x  | -97.23%        |
| 512x512   | uint8   | 4        | 0.00    | divide        | scalar | 0.028811 | 0.000055   | 524.65x | -99.81%        |
| 512x512   | uint8   | 4        | 0.00    | divide        | sse42  | 0.028811 | 0.000063   | 457.90x | -99.78%        |
| 512x512   | uint8   | 4        | 0.00    | divide        | avx2   | 0.028811 | 0.000044   | 656.18x | -99.85%        |
| 512x512   | uint8   | 4        | 1.00    | divide        | scalar | 0.029632 | 0.006640   | 4.46x   | -77.59%        |
| 512x512   | uint8   | 4        | 1.00    | divide        | sse42  | 0.029632 | 0.000894   | 33.14x  | -96.98%        |
| 512x512   | uint8   | 4        | 1.00    | divide        | avx2   | 0.029632 | 0.000797   | 37.17x  | -97.31%        |
| 512x512   | uint8   | 4        | 0.50    | overlay       | scalar | 0.035896 | 0.010404   | 3.45x   | -71.02%        |
| 512x512   | uint8   | 4        | 0.50    | overlay       | sse42  | 0.035896 | 0.000903   | 39.74x  | -97.48%        |
| 512x512   | uint8   | 4        | 0.50    | overlay       | avx2   | 0.035896 | 0.000803   | 44.72x  | -97.76%        |
| 512x512   | uint8   | 4        | 0.00    | overlay       | scalar | 0.034562 | 0.000048   | 715.24x | -99.86%        |
| 512x512   | uint8   | 4        | 0.00    | overlay       | sse42  | 0.034562 | 0.000054   | 639.35x | -99.84%        |
| 512x512   | uint8   | 4        | 0.00    | overlay       | avx2   | 0.034562 | 0.000048   | 724.51x | -99.86%        |
| 512x512   | uint8   | 4        | 1.00    | overlay       | scalar | 0.034522 | 0.010361   | 3.33x   | -69.99%        |
| 512x512   | uint8   | 4        | 1.00    | overlay       | sse42  | 0.034522 | 0.000931   | 37.07x  | -97.30%        |
| 512x512   | uint8   | 4        | 1.00    | overlay       | avx2   | 0.034522 | 0.000804   | 42.95x  | -97.67%        |
| 512x512   | float32 | 3        | 0.50    | normal        | scalar | 0.029636 | 0.002367   | 12.52x  | -92.01%        |
| 512x512   | float32 | 3        | 0.50    | normal        | sse42  | 0.029636 | 0.000926   | 31.99x  | -96.87%        |
| 512x512   | float32 | 3        | 0.50    | normal        | avx2   | 0.029636 | 0.000616   | 48.14x  | -97.92%        |
| 512x512   | float32 | 3        | 0.00    | normal        | scalar | 0.031087 | 0.001073   | 28.97x  | -96.55%        |
| 512x512   | float32 | 3        | 0.00    | normal        | sse42  | 0.031087 | 0.000797   | 39.00x  | -97.44%        |
| 512x512   | float32 | 3        | 0.00    | normal        | avx2   | 0.031087 | 0.001083   | 28.70x  | -96.52%        |
| 512x512   | float32 | 3        | 1.00    | normal        | scalar | 0.036363 | 0.000714   | 50.95x  | -98.04%        |
| 512x512   | float32 | 3        | 1.00    | normal        | sse42  | 0.036363 | 0.000531   | 68.53x  | -98.54%        |
| 512x512   | float32 | 3        | 1.00    | normal        | avx2   | 0.036363 | 0.000591   | 61.55x  | -98.38%        |
| 512x512   | float32 | 3        | 0.50    | soft_light    | scalar | 0.047936 | 0.003135   | 15.29x  | -93.46%        |
| 512x512   | float32 | 3        | 0.50    | soft_light    | sse42  | 0.047936 | 0.000654   | 73.27x  | -98.64%        |
| 512x512   | float32 | 3        | 0.50    | soft_light    | avx2   | 0.047936 | 0.000522   | 91.91x  | -98.91%        |
| 512x512   | float32 | 3        | 0.00    | soft_light    | scalar | 0.044895 | 0.000825   | 54.41x  | -98.16%        |
| 512x512   | float32 | 3        | 0.00    | soft_light    | sse42  | 0.044895 | 0.000371   | 121.11x | -99.17%        |
| 512x512   | float32 | 3        | 0.00    | soft_light    | avx2   | 0.044895 | 0.000371   | 121.16x | -99.17%        |
| 512x512   | float32 | 3        | 1.00    | soft_light    | scalar | 0.047523 | 0.003003   | 15.82x  | -93.68%        |
| 512x512   | float32 | 3        | 1.00    | soft_light    | sse42  | 0.047523 | 0.000582   | 81.65x  | -98.78%        |
| 512x512   | float32 | 3        | 1.00    | soft_light    | avx2   | 0.047523 | 0.000500   | 95.00x  | -98.95%        |
| 512x512   | float32 | 3        | 0.50    | lighten_only  | scalar | 0.034085 | 0.003333   | 10.23x  | -90.22%        |
| 512x512   | float32 | 3        | 0.50    | lighten_only  | sse42  | 0.034085 | 0.000607   | 56.14x  | -98.22%        |
| 512x512   | float32 | 3        | 0.50    | lighten_only  | avx2   | 0.034085 | 0.000534   | 63.86x  | -98.43%        |
| 512x512   | float32 | 3        | 0.00    | lighten_only  | scalar | 0.036531 | 0.000765   | 47.76x  | -97.91%        |
| 512x512   | float32 | 3        | 0.00    | lighten_only  | sse42  | 0.036531 | 0.000503   | 72.68x  | -98.62%        |
| 512x512   | float32 | 3        | 0.00    | lighten_only  | avx2   | 0.036531 | 0.000392   | 93.22x  | -98.93%        |
| 512x512   | float32 | 3        | 1.00    | lighten_only  | scalar | 0.037231 | 0.003540   | 10.52x  | -90.49%        |
| 512x512   | float32 | 3        | 1.00    | lighten_only  | sse42  | 0.037231 | 0.000649   | 57.39x  | -98.26%        |
| 512x512   | float32 | 3        | 1.00    | lighten_only  | avx2   | 0.037231 | 0.000723   | 51.52x  | -98.06%        |
| 512x512   | float32 | 3        | 0.50    | screen        | scalar | 0.045132 | 0.003194   | 14.13x  | -92.92%        |
| 512x512   | float32 | 3        | 0.50    | screen        | sse42  | 0.045132 | 0.000745   | 60.55x  | -98.35%        |
| 512x512   | float32 | 3        | 0.50    | screen        | avx2   | 0.045132 | 0.000599   | 75.39x  | -98.67%        |
| 512x512   | float32 | 3        | 0.00    | screen        | scalar | 0.040082 | 0.000671   | 59.76x  | -98.33%        |
| 512x512   | float32 | 3        | 0.00    | screen        | sse42  | 0.040082 | 0.000579   | 69.20x  | -98.56%        |
| 512x512   | float32 | 3        | 0.00    | screen        | avx2   | 0.040082 | 0.000447   | 89.61x  | -98.88%        |
| 512x512   | float32 | 3        | 1.00    | screen        | scalar | 0.036561 | 0.002715   | 13.47x  | -92.57%        |
| 512x512   | float32 | 3        | 1.00    | screen        | sse42  | 0.036561 | 0.000556   | 65.79x  | -98.48%        |
| 512x512   | float32 | 3        | 1.00    | screen        | avx2   | 0.036561 | 0.000475   | 77.04x  | -98.70%        |
| 512x512   | float32 | 3        | 0.50    | dodge         | scalar | 0.034943 | 0.002936   | 11.90x  | -91.60%        |
| 512x512   | float32 | 3        | 0.50    | dodge         | sse42  | 0.034943 | 0.000538   | 64.90x  | -98.46%        |
| 512x512   | float32 | 3        | 0.50    | dodge         | avx2   | 0.034943 | 0.000541   | 64.61x  | -98.45%        |
| 512x512   | float32 | 3        | 0.00    | dodge         | scalar | 0.034853 | 0.000674   | 51.71x  | -98.07%        |
| 512x512   | float32 | 3        | 0.00    | dodge         | sse42  | 0.034853 | 0.000367   | 95.09x  | -98.95%        |
| 512x512   | float32 | 3        | 0.00    | dodge         | avx2   | 0.034853 | 0.000390   | 89.48x  | -98.88%        |
| 512x512   | float32 | 3        | 1.00    | dodge         | scalar | 0.037254 | 0.002882   | 12.93x  | -92.27%        |
| 512x512   | float32 | 3        | 1.00    | dodge         | sse42  | 0.037254 | 0.000581   | 64.12x  | -98.44%        |
| 512x512   | float32 | 3        | 1.00    | dodge         | avx2   | 0.037254 | 0.000584   | 63.79x  | -98.43%        |
| 512x512   | float32 | 3        | 0.50    | addition      | scalar | 0.035951 | 0.006628   | 5.42x   | -81.56%        |
| 512x512   | float32 | 3        | 0.50    | addition      | sse42  | 0.035951 | 0.000579   | 62.09x  | -98.39%        |
| 512x512   | float32 | 3        | 0.50    | addition      | avx2   | 0.035951 | 0.000556   | 64.68x  | -98.45%        |
| 512x512   | float32 | 3        | 0.00    | addition      | scalar | 0.034454 | 0.000816   | 42.20x  | -97.63%        |
| 512x512   | float32 | 3        | 0.00    | addition      | sse42  | 0.034454 | 0.000462   | 74.57x  | -98.66%        |
| 512x512   | float32 | 3        | 0.00    | addition      | avx2   | 0.034454 | 0.000357   | 96.45x  | -98.96%        |
| 512x512   | float32 | 3        | 1.00    | addition      | scalar | 0.033838 | 0.009849   | 3.44x   | -70.89%        |
| 512x512   | float32 | 3        | 1.00    | addition      | sse42  | 0.033838 | 0.000636   | 53.18x  | -98.12%        |
| 512x512   | float32 | 3        | 1.00    | addition      | avx2   | 0.033838 | 0.000486   | 69.69x  | -98.57%        |
| 512x512   | float32 | 3        | 0.50    | darken_only   | scalar | 0.036435 | 0.003423   | 10.64x  | -90.61%        |
| 512x512   | float32 | 3        | 0.50    | darken_only   | sse42  | 0.036435 | 0.000571   | 63.85x  | -98.43%        |
| 512x512   | float32 | 3        | 0.50    | darken_only   | avx2   | 0.036435 | 0.000792   | 46.00x  | -97.83%        |
| 512x512   | float32 | 3        | 0.00    | darken_only   | scalar | 0.039181 | 0.000786   | 49.85x  | -97.99%        |
| 512x512   | float32 | 3        | 0.00    | darken_only   | sse42  | 0.039181 | 0.000503   | 77.88x  | -98.72%        |
| 512x512   | float32 | 3        | 0.00    | darken_only   | avx2   | 0.039181 | 0.000470   | 83.39x  | -98.80%        |
| 512x512   | float32 | 3        | 1.00    | darken_only   | scalar | 0.039675 | 0.003594   | 11.04x  | -90.94%        |
| 512x512   | float32 | 3        | 1.00    | darken_only   | sse42  | 0.039675 | 0.000757   | 52.41x  | -98.09%        |
| 512x512   | float32 | 3        | 1.00    | darken_only   | avx2   | 0.039675 | 0.000717   | 55.36x  | -98.19%        |
| 512x512   | float32 | 3        | 0.50    | multiply      | scalar | 0.043701 | 0.002715   | 16.10x  | -93.79%        |
| 512x512   | float32 | 3        | 0.50    | multiply      | sse42  | 0.043701 | 0.000853   | 51.21x  | -98.05%        |
| 512x512   | float32 | 3        | 0.50    | multiply      | avx2   | 0.043701 | 0.000691   | 63.22x  | -98.42%        |
| 512x512   | float32 | 3        | 0.00    | multiply      | scalar | 0.040699 | 0.001370   | 29.70x  | -96.63%        |
| 512x512   | float32 | 3        | 0.00    | multiply      | sse42  | 0.040699 | 0.000699   | 58.23x  | -98.28%        |
| 512x512   | float32 | 3        | 0.00    | multiply      | avx2   | 0.040699 | 0.000629   | 64.71x  | -98.45%        |
| 512x512   | float32 | 3        | 1.00    | multiply      | scalar | 0.038357 | 0.002792   | 13.74x  | -92.72%        |
| 512x512   | float32 | 3        | 1.00    | multiply      | sse42  | 0.038357 | 0.000782   | 49.04x  | -97.96%        |
| 512x512   | float32 | 3        | 1.00    | multiply      | avx2   | 0.038357 | 0.000663   | 57.85x  | -98.27%        |
| 512x512   | float32 | 3        | 0.50    | hard_light    | scalar | 0.053099 | 0.007897   | 6.72x   | -85.13%        |
| 512x512   | float32 | 3        | 0.50    | hard_light    | sse42  | 0.053099 | 0.001718   | 30.90x  | -96.76%        |
| 512x512   | float32 | 3        | 0.50    | hard_light    | avx2   | 0.053099 | 0.000791   | 67.10x  | -98.51%        |
| 512x512   | float32 | 3        | 0.00    | hard_light    | scalar | 0.054133 | 0.001356   | 39.91x  | -97.49%        |
| 512x512   | float32 | 3        | 0.00    | hard_light    | sse42  | 0.054133 | 0.000577   | 93.75x  | -98.93%        |
| 512x512   | float32 | 3        | 0.00    | hard_light    | avx2   | 0.054133 | 0.000693   | 78.11x  | -98.72%        |
| 512x512   | float32 | 3        | 1.00    | hard_light    | scalar | 0.052803 | 0.007986   | 6.61x   | -84.88%        |
| 512x512   | float32 | 3        | 1.00    | hard_light    | sse42  | 0.052803 | 0.000971   | 54.39x  | -98.16%        |
| 512x512   | float32 | 3        | 1.00    | hard_light    | avx2   | 0.052803 | 0.000696   | 75.87x  | -98.68%        |
| 512x512   | float32 | 3        | 0.50    | difference    | scalar | 0.049789 | 0.003223   | 15.45x  | -93.53%        |
| 512x512   | float32 | 3        | 0.50    | difference    | sse42  | 0.049789 | 0.000895   | 55.64x  | -98.20%        |
| 512x512   | float32 | 3        | 0.50    | difference    | avx2   | 0.049789 | 0.000667   | 74.61x  | -98.66%        |
| 512x512   | float32 | 3        | 0.00    | difference    | scalar | 0.048538 | 0.000710   | 68.39x  | -98.54%        |
| 512x512   | float32 | 3        | 0.00    | difference    | sse42  | 0.048538 | 0.000444   | 109.42x | -99.09%        |
| 512x512   | float32 | 3        | 0.00    | difference    | avx2   | 0.048538 | 0.000462   | 105.09x | -99.05%        |
| 512x512   | float32 | 3        | 1.00    | difference    | scalar | 0.042908 | 0.002621   | 16.37x  | -93.89%        |
| 512x512   | float32 | 3        | 1.00    | difference    | sse42  | 0.042908 | 0.000648   | 66.23x  | -98.49%        |
| 512x512   | float32 | 3        | 1.00    | difference    | avx2   | 0.042908 | 0.000492   | 87.21x  | -98.85%        |
| 512x512   | float32 | 3        | 0.50    | subtract      | scalar | 0.036473 | 0.003255   | 11.21x  | -91.08%        |
| 512x512   | float32 | 3        | 0.50    | subtract      | sse42  | 0.036473 | 0.000595   | 61.25x  | -98.37%        |
| 512x512   | float32 | 3        | 0.50    | subtract      | avx2   | 0.036473 | 0.000564   | 64.63x  | -98.45%        |
| 512x512   | float32 | 3        | 0.00    | subtract      | scalar | 0.035796 | 0.000725   | 49.37x  | -97.97%        |
| 512x512   | float32 | 3        | 0.00    | subtract      | sse42  | 0.035796 | 0.000446   | 80.25x  | -98.75%        |
| 512x512   | float32 | 3        | 0.00    | subtract      | avx2   | 0.035796 | 0.000630   | 56.83x  | -98.24%        |
| 512x512   | float32 | 3        | 1.00    | subtract      | scalar | 0.036637 | 0.003848   | 9.52x   | -89.50%        |
| 512x512   | float32 | 3        | 1.00    | subtract      | sse42  | 0.036637 | 0.001029   | 35.60x  | -97.19%        |
| 512x512   | float32 | 3        | 1.00    | subtract      | avx2   | 0.036637 | 0.000844   | 43.43x  | -97.70%        |
| 512x512   | float32 | 3        | 0.50    | grain_extract | scalar | 0.039107 | 0.004715   | 8.29x   | -87.94%        |
| 512x512   | float32 | 3        | 0.50    | grain_extract | sse42  | 0.039107 | 0.000756   | 51.70x  | -98.07%        |
| 512x512   | float32 | 3        | 0.50    | grain_extract | avx2   | 0.039107 | 0.000594   | 65.84x  | -98.48%        |
| 512x512   | float32 | 3        | 0.00    | grain_extract | scalar | 0.043381 | 0.000802   | 54.12x  | -98.15%        |
| 512x512   | float32 | 3        | 0.00    | grain_extract | sse42  | 0.043381 | 0.000521   | 83.19x  | -98.80%        |
| 512x512   | float32 | 3        | 0.00    | grain_extract | avx2   | 0.043381 | 0.000422   | 102.74x | -99.03%        |
| 512x512   | float32 | 3        | 1.00    | grain_extract | scalar | 0.036042 | 0.004422   | 8.15x   | -87.73%        |
| 512x512   | float32 | 3        | 1.00    | grain_extract | sse42  | 0.036042 | 0.000639   | 56.43x  | -98.23%        |
| 512x512   | float32 | 3        | 1.00    | grain_extract | avx2   | 0.036042 | 0.000500   | 72.04x  | -98.61%        |
| 512x512   | float32 | 3        | 0.50    | grain_merge   | scalar | 0.035073 | 0.004403   | 7.97x   | -87.45%        |
| 512x512   | float32 | 3        | 0.50    | grain_merge   | sse42  | 0.035073 | 0.000597   | 58.77x  | -98.30%        |
| 512x512   | float32 | 3        | 0.50    | grain_merge   | avx2   | 0.035073 | 0.000602   | 58.30x  | -98.28%        |
| 512x512   | float32 | 3        | 0.00    | grain_merge   | scalar | 0.035051 | 0.000654   | 53.57x  | -98.13%        |
| 512x512   | float32 | 3        | 0.00    | grain_merge   | sse42  | 0.035051 | 0.000364   | 96.37x  | -98.96%        |
| 512x512   | float32 | 3        | 0.00    | grain_merge   | avx2   | 0.035051 | 0.000377   | 93.00x  | -98.92%        |
| 512x512   | float32 | 3        | 1.00    | grain_merge   | scalar | 0.035542 | 0.004582   | 7.76x   | -87.11%        |
| 512x512   | float32 | 3        | 1.00    | grain_merge   | sse42  | 0.035542 | 0.000696   | 51.09x  | -98.04%        |
| 512x512   | float32 | 3        | 1.00    | grain_merge   | avx2   | 0.035542 | 0.000559   | 63.59x  | -98.43%        |
| 512x512   | float32 | 3        | 0.50    | divide        | scalar | 0.036829 | 0.002974   | 12.38x  | -91.92%        |
| 512x512   | float32 | 3        | 0.50    | divide        | sse42  | 0.036829 | 0.001021   | 36.07x  | -97.23%        |
| 512x512   | float32 | 3        | 0.50    | divide        | avx2   | 0.036829 | 0.000475   | 77.51x  | -98.71%        |
| 512x512   | float32 | 3        | 0.00    | divide        | scalar | 0.035238 | 0.000883   | 39.92x  | -97.49%        |
| 512x512   | float32 | 3        | 0.00    | divide        | sse42  | 0.035238 | 0.000419   | 84.10x  | -98.81%        |
| 512x512   | float32 | 3        | 0.00    | divide        | avx2   | 0.035238 | 0.000370   | 95.30x  | -98.95%        |
| 512x512   | float32 | 3        | 1.00    | divide        | scalar | 0.034906 | 0.002944   | 11.86x  | -91.57%        |
| 512x512   | float32 | 3        | 1.00    | divide        | sse42  | 0.034906 | 0.000537   | 64.98x  | -98.46%        |
| 512x512   | float32 | 3        | 1.00    | divide        | avx2   | 0.034906 | 0.000496   | 70.38x  | -98.58%        |
| 512x512   | float32 | 3        | 0.50    | overlay       | scalar | 0.041415 | 0.006894   | 6.01x   | -83.35%        |
| 512x512   | float32 | 3        | 0.50    | overlay       | sse42  | 0.041415 | 0.000572   | 72.46x  | -98.62%        |
| 512x512   | float32 | 3        | 0.50    | overlay       | avx2   | 0.041415 | 0.000637   | 65.03x  | -98.46%        |
| 512x512   | float32 | 3        | 0.00    | overlay       | scalar | 0.042382 | 0.000667   | 63.55x  | -98.43%        |
| 512x512   | float32 | 3        | 0.00    | overlay       | sse42  | 0.042382 | 0.000449   | 94.37x  | -98.94%        |
| 512x512   | float32 | 3        | 0.00    | overlay       | avx2   | 0.042382 | 0.000395   | 107.35x | -99.07%        |
| 512x512   | float32 | 3        | 1.00    | overlay       | scalar | 0.041686 | 0.006919   | 6.02x   | -83.40%        |
| 512x512   | float32 | 3        | 1.00    | overlay       | sse42  | 0.041686 | 0.000643   | 64.78x  | -98.46%        |
| 512x512   | float32 | 3        | 1.00    | overlay       | avx2   | 0.041686 | 0.000553   | 75.39x  | -98.67%        |
| 512x512   | float32 | 4        | 0.50    | normal        | scalar | 0.021964 | 0.002692   | 8.16x   | -87.74%        |
| 512x512   | float32 | 4        | 0.50    | normal        | sse42  | 0.021964 | 0.000801   | 27.41x  | -96.35%        |
| 512x512   | float32 | 4        | 0.50    | normal        | avx2   | 0.021964 | 0.000689   | 31.88x  | -96.86%        |
| 512x512   | float32 | 4        | 0.00    | normal        | scalar | 0.021605 | 0.000557   | 38.81x  | -97.42%        |
| 512x512   | float32 | 4        | 0.00    | normal        | sse42  | 0.021605 | 0.000288   | 74.93x  | -98.67%        |
| 512x512   | float32 | 4        | 0.00    | normal        | avx2   | 0.021605 | 0.000300   | 71.90x  | -98.61%        |
| 512x512   | float32 | 4        | 1.00    | normal        | scalar | 0.022705 | 0.002738   | 8.29x   | -87.94%        |
| 512x512   | float32 | 4        | 1.00    | normal        | sse42  | 0.022705 | 0.000749   | 30.32x  | -96.70%        |
| 512x512   | float32 | 4        | 1.00    | normal        | avx2   | 0.022705 | 0.000738   | 30.75x  | -96.75%        |
| 512x512   | float32 | 4        | 0.50    | soft_light    | scalar | 0.032561 | 0.003347   | 9.73x   | -89.72%        |
| 512x512   | float32 | 4        | 0.50    | soft_light    | sse42  | 0.032561 | 0.000798   | 40.80x  | -97.55%        |
| 512x512   | float32 | 4        | 0.50    | soft_light    | avx2   | 0.032561 | 0.000809   | 40.23x  | -97.51%        |
| 512x512   | float32 | 4        | 0.00    | soft_light    | scalar | 0.032386 | 0.000616   | 52.58x  | -98.10%        |
| 512x512   | float32 | 4        | 0.00    | soft_light    | sse42  | 0.032386 | 0.000280   | 115.74x | -99.14%        |
| 512x512   | float32 | 4        | 0.00    | soft_light    | avx2   | 0.032386 | 0.000465   | 69.67x  | -98.56%        |
| 512x512   | float32 | 4        | 1.00    | soft_light    | scalar | 0.033570 | 0.003039   | 11.05x  | -90.95%        |
| 512x512   | float32 | 4        | 1.00    | soft_light    | sse42  | 0.033570 | 0.000818   | 41.02x  | -97.56%        |
| 512x512   | float32 | 4        | 1.00    | soft_light    | avx2   | 0.033570 | 0.000775   | 43.30x  | -97.69%        |
| 512x512   | float32 | 4        | 0.50    | lighten_only  | scalar | 0.025674 | 0.003749   | 6.85x   | -85.40%        |
| 512x512   | float32 | 4        | 0.50    | lighten_only  | sse42  | 0.025674 | 0.000757   | 33.91x  | -97.05%        |
| 512x512   | float32 | 4        | 0.50    | lighten_only  | avx2   | 0.025674 | 0.000781   | 32.87x  | -96.96%        |
| 512x512   | float32 | 4        | 0.00    | lighten_only  | scalar | 0.025554 | 0.000560   | 45.63x  | -97.81%        |
| 512x512   | float32 | 4        | 0.00    | lighten_only  | sse42  | 0.025554 | 0.000264   | 96.73x  | -98.97%        |
| 512x512   | float32 | 4        | 0.00    | lighten_only  | avx2   | 0.025554 | 0.000339   | 75.37x  | -98.67%        |
| 512x512   | float32 | 4        | 1.00    | lighten_only  | scalar | 0.025370 | 0.003504   | 7.24x   | -86.19%        |
| 512x512   | float32 | 4        | 1.00    | lighten_only  | sse42  | 0.025370 | 0.000756   | 33.58x  | -97.02%        |
| 512x512   | float32 | 4        | 1.00    | lighten_only  | avx2   | 0.025370 | 0.000768   | 33.03x  | -96.97%        |
| 512x512   | float32 | 4        | 0.50    | screen        | scalar | 0.026660 | 0.003025   | 8.81x   | -88.65%        |
| 512x512   | float32 | 4        | 0.50    | screen        | sse42  | 0.026660 | 0.000818   | 32.59x  | -96.93%        |
| 512x512   | float32 | 4        | 0.50    | screen        | avx2   | 0.026660 | 0.000828   | 32.20x  | -96.89%        |
| 512x512   | float32 | 4        | 0.00    | screen        | scalar | 0.026701 | 0.000545   | 49.01x  | -97.96%        |
| 512x512   | float32 | 4        | 0.00    | screen        | sse42  | 0.026701 | 0.000306   | 87.32x  | -98.85%        |
| 512x512   | float32 | 4        | 0.00    | screen        | avx2   | 0.026701 | 0.000270   | 98.98x  | -98.99%        |
| 512x512   | float32 | 4        | 1.00    | screen        | scalar | 0.026448 | 0.002945   | 8.98x   | -88.86%        |
| 512x512   | float32 | 4        | 1.00    | screen        | sse42  | 0.026448 | 0.000831   | 31.84x  | -96.86%        |
| 512x512   | float32 | 4        | 1.00    | screen        | avx2   | 0.026448 | 0.000935   | 28.29x  | -96.47%        |
| 512x512   | float32 | 4        | 0.50    | dodge         | scalar | 0.025941 | 0.003192   | 8.13x   | -87.70%        |
| 512x512   | float32 | 4        | 0.50    | dodge         | sse42  | 0.025941 | 0.000965   | 26.87x  | -96.28%        |
| 512x512   | float32 | 4        | 0.50    | dodge         | avx2   | 0.025941 | 0.000809   | 32.06x  | -96.88%        |
| 512x512   | float32 | 4        | 0.00    | dodge         | scalar | 0.026250 | 0.000536   | 48.99x  | -97.96%        |
| 512x512   | float32 | 4        | 0.00    | dodge         | sse42  | 0.026250 | 0.000344   | 76.22x  | -98.69%        |
| 512x512   | float32 | 4        | 0.00    | dodge         | avx2   | 0.026250 | 0.000278   | 94.57x  | -98.94%        |
| 512x512   | float32 | 4        | 1.00    | dodge         | scalar | 0.026583 | 0.003528   | 7.53x   | -86.73%        |
| 512x512   | float32 | 4        | 1.00    | dodge         | sse42  | 0.026583 | 0.000962   | 27.64x  | -96.38%        |
| 512x512   | float32 | 4        | 1.00    | dodge         | avx2   | 0.026583 | 0.000780   | 34.07x  | -97.07%        |
| 512x512   | float32 | 4        | 0.50    | addition      | scalar | 0.025528 | 0.005588   | 4.57x   | -78.11%        |
| 512x512   | float32 | 4        | 0.50    | addition      | sse42  | 0.025528 | 0.001059   | 24.11x  | -95.85%        |
| 512x512   | float32 | 4        | 0.50    | addition      | avx2   | 0.025528 | 0.000945   | 27.01x  | -96.30%        |
| 512x512   | float32 | 4        | 0.00    | addition      | scalar | 0.025426 | 0.000525   | 48.42x  | -97.93%        |
| 512x512   | float32 | 4        | 0.00    | addition      | sse42  | 0.025426 | 0.000282   | 90.32x  | -98.89%        |
| 512x512   | float32 | 4        | 0.00    | addition      | avx2   | 0.025426 | 0.000634   | 40.11x  | -97.51%        |
| 512x512   | float32 | 4        | 1.00    | addition      | scalar | 0.025302 | 0.007214   | 3.51x   | -71.49%        |
| 512x512   | float32 | 4        | 1.00    | addition      | sse42  | 0.025302 | 0.000871   | 29.04x  | -96.56%        |
| 512x512   | float32 | 4        | 1.00    | addition      | avx2   | 0.025302 | 0.000819   | 30.89x  | -96.76%        |
| 512x512   | float32 | 4        | 0.50    | darken_only   | scalar | 0.026224 | 0.003672   | 7.14x   | -86.00%        |
| 512x512   | float32 | 4        | 0.50    | darken_only   | sse42  | 0.026224 | 0.000743   | 35.30x  | -97.17%        |
| 512x512   | float32 | 4        | 0.50    | darken_only   | avx2   | 0.026224 | 0.000839   | 31.26x  | -96.80%        |
| 512x512   | float32 | 4        | 0.00    | darken_only   | scalar | 0.026766 | 0.000630   | 42.48x  | -97.65%        |
| 512x512   | float32 | 4        | 0.00    | darken_only   | sse42  | 0.026766 | 0.000294   | 91.16x  | -98.90%        |
| 512x512   | float32 | 4        | 0.00    | darken_only   | avx2   | 0.026766 | 0.000308   | 86.86x  | -98.85%        |
| 512x512   | float32 | 4        | 1.00    | darken_only   | scalar | 0.026007 | 0.003348   | 7.77x   | -87.13%        |
| 512x512   | float32 | 4        | 1.00    | darken_only   | sse42  | 0.026007 | 0.000909   | 28.61x  | -96.50%        |
| 512x512   | float32 | 4        | 1.00    | darken_only   | avx2   | 0.026007 | 0.000817   | 31.82x  | -96.86%        |
| 512x512   | float32 | 4        | 0.50    | multiply      | scalar | 0.027552 | 0.003122   | 8.83x   | -88.67%        |
| 512x512   | float32 | 4        | 0.50    | multiply      | sse42  | 0.027552 | 0.000844   | 32.63x  | -96.94%        |
| 512x512   | float32 | 4        | 0.50    | multiply      | avx2   | 0.027552 | 0.000856   | 32.20x  | -96.89%        |
| 512x512   | float32 | 4        | 0.00    | multiply      | scalar | 0.028558 | 0.000590   | 48.42x  | -97.93%        |
| 512x512   | float32 | 4        | 0.00    | multiply      | sse42  | 0.028558 | 0.000382   | 74.83x  | -98.66%        |
| 512x512   | float32 | 4        | 0.00    | multiply      | avx2   | 0.028558 | 0.000306   | 93.26x  | -98.93%        |
| 512x512   | float32 | 4        | 1.00    | multiply      | scalar | 0.032243 | 0.004246   | 7.59x   | -86.83%        |
| 512x512   | float32 | 4        | 1.00    | multiply      | sse42  | 0.032243 | 0.001103   | 29.23x  | -96.58%        |
| 512x512   | float32 | 4        | 1.00    | multiply      | avx2   | 0.032243 | 0.000941   | 34.28x  | -97.08%        |
| 512x512   | float32 | 4        | 0.50    | hard_light    | scalar | 0.036698 | 0.007803   | 4.70x   | -78.74%        |
| 512x512   | float32 | 4        | 0.50    | hard_light    | sse42  | 0.036698 | 0.000971   | 37.81x  | -97.36%        |
| 512x512   | float32 | 4        | 0.50    | hard_light    | avx2   | 0.036698 | 0.001150   | 31.91x  | -96.87%        |
| 512x512   | float32 | 4        | 0.00    | hard_light    | scalar | 0.036374 | 0.000606   | 60.01x  | -98.33%        |
| 512x512   | float32 | 4        | 0.00    | hard_light    | sse42  | 0.036374 | 0.000285   | 127.77x | -99.22%        |
| 512x512   | float32 | 4        | 0.00    | hard_light    | avx2   | 0.036374 | 0.000384   | 94.67x  | -98.94%        |
| 512x512   | float32 | 4        | 1.00    | hard_light    | scalar | 0.036135 | 0.007678   | 4.71x   | -78.75%        |
| 512x512   | float32 | 4        | 1.00    | hard_light    | sse42  | 0.036135 | 0.000925   | 39.06x  | -97.44%        |
| 512x512   | float32 | 4        | 1.00    | hard_light    | avx2   | 0.036135 | 0.000923   | 39.13x  | -97.44%        |
| 512x512   | float32 | 4        | 0.50    | difference    | scalar | 0.033423 | 0.003019   | 11.07x  | -90.97%        |
| 512x512   | float32 | 4        | 0.50    | difference    | sse42  | 0.033423 | 0.000960   | 34.80x  | -97.13%        |
| 512x512   | float32 | 4        | 0.50    | difference    | avx2   | 0.033423 | 0.000935   | 35.76x  | -97.20%        |
| 512x512   | float32 | 4        | 0.00    | difference    | scalar | 0.033351 | 0.000560   | 59.53x  | -98.32%        |
| 512x512   | float32 | 4        | 0.00    | difference    | sse42  | 0.033351 | 0.000312   | 107.05x | -99.07%        |
| 512x512   | float32 | 4        | 0.00    | difference    | avx2   | 0.033351 | 0.000321   | 103.85x | -99.04%        |
| 512x512   | float32 | 4        | 1.00    | difference    | scalar | 0.033759 | 0.002906   | 11.62x  | -91.39%        |
| 512x512   | float32 | 4        | 1.00    | difference    | sse42  | 0.033759 | 0.000860   | 39.26x  | -97.45%        |
| 512x512   | float32 | 4        | 1.00    | difference    | avx2   | 0.033759 | 0.000843   | 40.06x  | -97.50%        |
| 512x512   | float32 | 4        | 0.50    | subtract      | scalar | 0.025841 | 0.003811   | 6.78x   | -85.25%        |
| 512x512   | float32 | 4        | 0.50    | subtract      | sse42  | 0.025841 | 0.000886   | 29.16x  | -96.57%        |
| 512x512   | float32 | 4        | 0.50    | subtract      | avx2   | 0.025841 | 0.000819   | 31.54x  | -96.83%        |
| 512x512   | float32 | 4        | 0.00    | subtract      | scalar | 0.025174 | 0.000657   | 38.30x  | -97.39%        |
| 512x512   | float32 | 4        | 0.00    | subtract      | sse42  | 0.025174 | 0.000290   | 86.68x  | -98.85%        |
| 512x512   | float32 | 4        | 0.00    | subtract      | avx2   | 0.025174 | 0.000280   | 89.79x  | -98.89%        |
| 512x512   | float32 | 4        | 1.00    | subtract      | scalar | 0.025820 | 0.003715   | 6.95x   | -85.61%        |
| 512x512   | float32 | 4        | 1.00    | subtract      | sse42  | 0.025820 | 0.000824   | 31.33x  | -96.81%        |
| 512x512   | float32 | 4        | 1.00    | subtract      | avx2   | 0.025820 | 0.000788   | 32.76x  | -96.95%        |
| 512x512   | float32 | 4        | 0.50    | grain_extract | scalar | 0.026379 | 0.004427   | 5.96x   | -83.22%        |
| 512x512   | float32 | 4        | 0.50    | grain_extract | sse42  | 0.026379 | 0.000804   | 32.81x  | -96.95%        |
| 512x512   | float32 | 4        | 0.50    | grain_extract | avx2   | 0.026379 | 0.000796   | 33.14x  | -96.98%        |
| 512x512   | float32 | 4        | 0.00    | grain_extract | scalar | 0.025827 | 0.000635   | 40.64x  | -97.54%        |
| 512x512   | float32 | 4        | 0.00    | grain_extract | sse42  | 0.025827 | 0.000268   | 96.32x  | -98.96%        |
| 512x512   | float32 | 4        | 0.00    | grain_extract | avx2   | 0.025827 | 0.000297   | 86.98x  | -98.85%        |
| 512x512   | float32 | 4        | 1.00    | grain_extract | scalar | 0.025750 | 0.004435   | 5.81x   | -82.78%        |
| 512x512   | float32 | 4        | 1.00    | grain_extract | sse42  | 0.025750 | 0.000837   | 30.77x  | -96.75%        |
| 512x512   | float32 | 4        | 1.00    | grain_extract | avx2   | 0.025750 | 0.000826   | 31.16x  | -96.79%        |
| 512x512   | float32 | 4        | 0.50    | grain_merge   | scalar | 0.026609 | 0.004507   | 5.90x   | -83.06%        |
| 512x512   | float32 | 4        | 0.50    | grain_merge   | sse42  | 0.026609 | 0.000845   | 31.48x  | -96.82%        |
| 512x512   | float32 | 4        | 0.50    | grain_merge   | avx2   | 0.026609 | 0.001360   | 19.56x  | -94.89%        |
| 512x512   | float32 | 4        | 0.00    | grain_merge   | scalar | 0.025526 | 0.000654   | 39.02x  | -97.44%        |
| 512x512   | float32 | 4        | 0.00    | grain_merge   | sse42  | 0.025526 | 0.000286   | 89.29x  | -98.88%        |
| 512x512   | float32 | 4        | 0.00    | grain_merge   | avx2   | 0.025526 | 0.000275   | 92.74x  | -98.92%        |
| 512x512   | float32 | 4        | 1.00    | grain_merge   | scalar | 0.026052 | 0.004448   | 5.86x   | -82.93%        |
| 512x512   | float32 | 4        | 1.00    | grain_merge   | sse42  | 0.026052 | 0.000935   | 27.87x  | -96.41%        |
| 512x512   | float32 | 4        | 1.00    | grain_merge   | avx2   | 0.026052 | 0.000757   | 34.39x  | -97.09%        |
| 512x512   | float32 | 4        | 0.50    | divide        | scalar | 0.026574 | 0.003417   | 7.78x   | -87.14%        |
| 512x512   | float32 | 4        | 0.50    | divide        | sse42  | 0.026574 | 0.000837   | 31.76x  | -96.85%        |
| 512x512   | float32 | 4        | 0.50    | divide        | avx2   | 0.026574 | 0.000781   | 34.01x  | -97.06%        |
| 512x512   | float32 | 4        | 0.00    | divide        | scalar | 0.026413 | 0.000533   | 49.52x  | -97.98%        |
| 512x512   | float32 | 4        | 0.00    | divide        | sse42  | 0.026413 | 0.000294   | 89.73x  | -98.89%        |
| 512x512   | float32 | 4        | 0.00    | divide        | avx2   | 0.026413 | 0.000279   | 94.57x  | -98.94%        |
| 512x512   | float32 | 4        | 1.00    | divide        | scalar | 0.026254 | 0.003185   | 8.24x   | -87.87%        |
| 512x512   | float32 | 4        | 1.00    | divide        | sse42  | 0.026254 | 0.000857   | 30.65x  | -96.74%        |
| 512x512   | float32 | 4        | 1.00    | divide        | avx2   | 0.026254 | 0.000924   | 28.42x  | -96.48%        |
| 512x512   | float32 | 4        | 0.50    | overlay       | scalar | 0.033545 | 0.007163   | 4.68x   | -78.65%        |
| 512x512   | float32 | 4        | 0.50    | overlay       | sse42  | 0.033545 | 0.000877   | 38.24x  | -97.38%        |
| 512x512   | float32 | 4        | 0.50    | overlay       | avx2   | 0.033545 | 0.000784   | 42.77x  | -97.66%        |
| 512x512   | float32 | 4        | 0.00    | overlay       | scalar | 0.033006 | 0.000606   | 54.43x  | -98.16%        |
| 512x512   | float32 | 4        | 0.00    | overlay       | sse42  | 0.033006 | 0.000435   | 75.95x  | -98.68%        |
| 512x512   | float32 | 4        | 0.00    | overlay       | avx2   | 0.033006 | 0.000281   | 117.45x | -99.15%        |
| 512x512   | float32 | 4        | 1.00    | overlay       | scalar | 0.033288 | 0.007227   | 4.61x   | -78.29%        |
| 512x512   | float32 | 4        | 1.00    | overlay       | sse42  | 0.033288 | 0.000821   | 40.53x  | -97.53%        |
| 512x512   | float32 | 4        | 1.00    | overlay       | avx2   | 0.033288 | 0.000809   | 41.16x  | -97.57%        |
| 1024x1024 | uint8   | 3        | 0.50    | normal        | scalar | 0.100895 | 0.025941   | 3.89x   | -74.29%        |
| 1024x1024 | uint8   | 3        | 0.50    | normal        | sse42  | 0.100895 | 0.011313   | 8.92x   | -88.79%        |
| 1024x1024 | uint8   | 3        | 0.50    | normal        | avx2   | 0.100895 | 0.011015   | 9.16x   | -89.08%        |
| 1024x1024 | uint8   | 3        | 0.50    | soft_light    | scalar | 0.134943 | 0.028514   | 4.73x   | -78.87%        |
| 1024x1024 | uint8   | 3        | 0.50    | soft_light    | sse42  | 0.134943 | 0.012639   | 10.68x  | -90.63%        |
| 1024x1024 | uint8   | 3        | 0.50    | soft_light    | avx2   | 0.134943 | 0.011191   | 12.06x  | -91.71%        |
| 1024x1024 | uint8   | 3        | 0.50    | lighten_only  | scalar | 0.106948 | 0.031183   | 3.43x   | -70.84%        |
| 1024x1024 | uint8   | 3        | 0.50    | lighten_only  | sse42  | 0.106948 | 0.012454   | 8.59x   | -88.35%        |
| 1024x1024 | uint8   | 3        | 0.50    | lighten_only  | avx2   | 0.106948 | 0.011166   | 9.58x   | -89.56%        |
| 1024x1024 | uint8   | 3        | 0.50    | screen        | scalar | 0.120596 | 0.028797   | 4.19x   | -76.12%        |
| 1024x1024 | uint8   | 3        | 0.50    | screen        | sse42  | 0.120596 | 0.012766   | 9.45x   | -89.41%        |
| 1024x1024 | uint8   | 3        | 0.50    | screen        | avx2   | 0.120596 | 0.011323   | 10.65x  | -90.61%        |
| 1024x1024 | uint8   | 3        | 0.50    | dodge         | scalar | 0.119609 | 0.029297   | 4.08x   | -75.51%        |
| 1024x1024 | uint8   | 3        | 0.50    | dodge         | sse42  | 0.119609 | 0.012742   | 9.39x   | -89.35%        |
| 1024x1024 | uint8   | 3        | 0.50    | dodge         | avx2   | 0.119609 | 0.011526   | 10.38x  | -90.36%        |
| 1024x1024 | uint8   | 3        | 0.50    | addition      | scalar | 0.114068 | 0.044153   | 2.58x   | -61.29%        |
| 1024x1024 | uint8   | 3        | 0.50    | addition      | sse42  | 0.114068 | 0.018078   | 6.31x   | -84.15%        |
| 1024x1024 | uint8   | 3        | 0.50    | addition      | avx2   | 0.114068 | 0.013282   | 8.59x   | -88.36%        |
| 1024x1024 | uint8   | 3        | 0.50    | darken_only   | scalar | 0.121125 | 0.032461   | 3.73x   | -73.20%        |
| 1024x1024 | uint8   | 3        | 0.50    | darken_only   | sse42  | 0.121125 | 0.012626   | 9.59x   | -89.58%        |
| 1024x1024 | uint8   | 3        | 0.50    | darken_only   | avx2   | 0.121125 | 0.011556   | 10.48x  | -90.46%        |
| 1024x1024 | uint8   | 3        | 0.50    | multiply      | scalar | 0.118224 | 0.029421   | 4.02x   | -75.11%        |
| 1024x1024 | uint8   | 3        | 0.50    | multiply      | sse42  | 0.118224 | 0.013164   | 8.98x   | -88.86%        |
| 1024x1024 | uint8   | 3        | 0.50    | multiply      | avx2   | 0.118224 | 0.012601   | 9.38x   | -89.34%        |
| 1024x1024 | uint8   | 3        | 0.50    | hard_light    | scalar | 0.171536 | 0.052825   | 3.25x   | -69.20%        |
| 1024x1024 | uint8   | 3        | 0.50    | hard_light    | sse42  | 0.171536 | 0.013300   | 12.90x  | -92.25%        |
| 1024x1024 | uint8   | 3        | 0.50    | hard_light    | avx2   | 0.171536 | 0.012897   | 13.30x  | -92.48%        |
| 1024x1024 | uint8   | 3        | 0.50    | difference    | scalar | 0.136830 | 0.027840   | 4.91x   | -79.65%        |
| 1024x1024 | uint8   | 3        | 0.50    | difference    | sse42  | 0.136830 | 0.012209   | 11.21x  | -91.08%        |
| 1024x1024 | uint8   | 3        | 0.50    | difference    | avx2   | 0.136830 | 0.011299   | 12.11x  | -91.74%        |
| 1024x1024 | uint8   | 3        | 0.50    | subtract      | scalar | 0.107973 | 0.026498   | 4.07x   | -75.46%        |
| 1024x1024 | uint8   | 3        | 0.50    | subtract      | sse42  | 0.107973 | 0.012590   | 8.58x   | -88.34%        |
| 1024x1024 | uint8   | 3        | 0.50    | subtract      | avx2   | 0.107973 | 0.011188   | 9.65x   | -89.64%        |
| 1024x1024 | uint8   | 3        | 0.50    | grain_extract | scalar | 0.107538 | 0.034598   | 3.11x   | -67.83%        |
| 1024x1024 | uint8   | 3        | 0.50    | grain_extract | sse42  | 0.107538 | 0.012513   | 8.59x   | -88.36%        |
| 1024x1024 | uint8   | 3        | 0.50    | grain_extract | avx2   | 0.107538 | 0.011168   | 9.63x   | -89.61%        |
| 1024x1024 | uint8   | 3        | 0.50    | grain_merge   | scalar | 0.110132 | 0.036160   | 3.05x   | -67.17%        |
| 1024x1024 | uint8   | 3        | 0.50    | grain_merge   | sse42  | 0.110132 | 0.012568   | 8.76x   | -88.59%        |
| 1024x1024 | uint8   | 3        | 0.50    | grain_merge   | avx2   | 0.110132 | 0.011141   | 9.89x   | -89.88%        |
| 1024x1024 | uint8   | 3        | 0.50    | divide        | scalar | 0.120480 | 0.029904   | 4.03x   | -75.18%        |
| 1024x1024 | uint8   | 3        | 0.50    | divide        | sse42  | 0.120480 | 0.012432   | 9.69x   | -89.68%        |
| 1024x1024 | uint8   | 3        | 0.50    | divide        | avx2   | 0.120480 | 0.011421   | 10.55x  | -90.52%        |
| 1024x1024 | uint8   | 3        | 0.50    | overlay       | scalar | 0.144593 | 0.046925   | 3.08x   | -67.55%        |
| 1024x1024 | uint8   | 3        | 0.50    | overlay       | sse42  | 0.144593 | 0.012925   | 11.19x  | -91.06%        |
| 1024x1024 | uint8   | 3        | 0.50    | overlay       | avx2   | 0.144593 | 0.011130   | 12.99x  | -92.30%        |
| 1024x1024 | uint8   | 4        | 0.50    | normal        | scalar | 0.084291 | 0.021167   | 3.98x   | -74.89%        |
| 1024x1024 | uint8   | 4        | 0.50    | normal        | sse42  | 0.084291 | 0.002838   | 29.70x  | -96.63%        |
| 1024x1024 | uint8   | 4        | 0.50    | normal        | avx2   | 0.084291 | 0.002498   | 33.74x  | -97.04%        |
| 1024x1024 | uint8   | 4        | 0.50    | soft_light    | scalar | 0.110492 | 0.026297   | 4.20x   | -76.20%        |
| 1024x1024 | uint8   | 4        | 0.50    | soft_light    | sse42  | 0.110492 | 0.003478   | 31.77x  | -96.85%        |
| 1024x1024 | uint8   | 4        | 0.50    | soft_light    | avx2   | 0.110492 | 0.003288   | 33.61x  | -97.02%        |
| 1024x1024 | uint8   | 4        | 0.50    | lighten_only  | scalar | 0.080266 | 0.027750   | 2.89x   | -65.43%        |
| 1024x1024 | uint8   | 4        | 0.50    | lighten_only  | sse42  | 0.080266 | 0.003250   | 24.70x  | -95.95%        |
| 1024x1024 | uint8   | 4        | 0.50    | lighten_only  | avx2   | 0.080266 | 0.002935   | 27.35x  | -96.34%        |
| 1024x1024 | uint8   | 4        | 0.50    | screen        | scalar | 0.086119 | 0.025689   | 3.35x   | -70.17%        |
| 1024x1024 | uint8   | 4        | 0.50    | screen        | sse42  | 0.086119 | 0.003449   | 24.97x  | -96.00%        |
| 1024x1024 | uint8   | 4        | 0.50    | screen        | avx2   | 0.086119 | 0.003054   | 28.20x  | -96.45%        |
| 1024x1024 | uint8   | 4        | 0.50    | dodge         | scalar | 0.085241 | 0.026425   | 3.23x   | -69.00%        |
| 1024x1024 | uint8   | 4        | 0.50    | dodge         | sse42  | 0.085241 | 0.004173   | 20.43x  | -95.10%        |
| 1024x1024 | uint8   | 4        | 0.50    | dodge         | avx2   | 0.085241 | 0.003374   | 25.26x  | -96.04%        |
| 1024x1024 | uint8   | 4        | 0.50    | addition      | scalar | 0.081852 | 0.031797   | 2.57x   | -61.15%        |
| 1024x1024 | uint8   | 4        | 0.50    | addition      | sse42  | 0.081852 | 0.004236   | 19.32x  | -94.82%        |
| 1024x1024 | uint8   | 4        | 0.50    | addition      | avx2   | 0.081852 | 0.003093   | 26.47x  | -96.22%        |
| 1024x1024 | uint8   | 4        | 0.50    | darken_only   | scalar | 0.082980 | 0.027932   | 2.97x   | -66.34%        |
| 1024x1024 | uint8   | 4        | 0.50    | darken_only   | sse42  | 0.082980 | 0.003064   | 27.09x  | -96.31%        |
| 1024x1024 | uint8   | 4        | 0.50    | darken_only   | avx2   | 0.082980 | 0.003032   | 27.37x  | -96.35%        |
| 1024x1024 | uint8   | 4        | 0.50    | multiply      | scalar | 0.084043 | 0.026081   | 3.22x   | -68.97%        |
| 1024x1024 | uint8   | 4        | 0.50    | multiply      | sse42  | 0.084043 | 0.003192   | 26.33x  | -96.20%        |
| 1024x1024 | uint8   | 4        | 0.50    | multiply      | avx2   | 0.084043 | 0.002975   | 28.25x  | -96.46%        |
| 1024x1024 | uint8   | 4        | 0.50    | hard_light    | scalar | 0.118821 | 0.042217   | 2.81x   | -64.47%        |
| 1024x1024 | uint8   | 4        | 0.50    | hard_light    | sse42  | 0.118821 | 0.003755   | 31.64x  | -96.84%        |
| 1024x1024 | uint8   | 4        | 0.50    | hard_light    | avx2   | 0.118821 | 0.003163   | 37.56x  | -97.34%        |
| 1024x1024 | uint8   | 4        | 0.50    | difference    | scalar | 0.112789 | 0.025101   | 4.49x   | -77.75%        |
| 1024x1024 | uint8   | 4        | 0.50    | difference    | sse42  | 0.112789 | 0.003139   | 35.93x  | -97.22%        |
| 1024x1024 | uint8   | 4        | 0.50    | difference    | avx2   | 0.112789 | 0.002986   | 37.77x  | -97.35%        |
| 1024x1024 | uint8   | 4        | 0.50    | subtract      | scalar | 0.082162 | 0.024406   | 3.37x   | -70.30%        |
| 1024x1024 | uint8   | 4        | 0.50    | subtract      | sse42  | 0.082162 | 0.004262   | 19.28x  | -94.81%        |
| 1024x1024 | uint8   | 4        | 0.50    | subtract      | avx2   | 0.082162 | 0.003470   | 23.68x  | -95.78%        |
| 1024x1024 | uint8   | 4        | 0.50    | grain_extract | scalar | 0.083095 | 0.030577   | 2.72x   | -63.20%        |
| 1024x1024 | uint8   | 4        | 0.50    | grain_extract | sse42  | 0.083095 | 0.003453   | 24.06x  | -95.84%        |
| 1024x1024 | uint8   | 4        | 0.50    | grain_extract | avx2   | 0.083095 | 0.003183   | 26.11x  | -96.17%        |
| 1024x1024 | uint8   | 4        | 0.50    | grain_merge   | scalar | 0.083589 | 0.030572   | 2.73x   | -63.43%        |
| 1024x1024 | uint8   | 4        | 0.50    | grain_merge   | sse42  | 0.083589 | 0.003476   | 24.05x  | -95.84%        |
| 1024x1024 | uint8   | 4        | 0.50    | grain_merge   | avx2   | 0.083589 | 0.003187   | 26.23x  | -96.19%        |
| 1024x1024 | uint8   | 4        | 0.50    | divide        | scalar | 0.088999 | 0.025565   | 3.48x   | -71.27%        |
| 1024x1024 | uint8   | 4        | 0.50    | divide        | sse42  | 0.088999 | 0.003546   | 25.10x  | -96.02%        |
| 1024x1024 | uint8   | 4        | 0.50    | divide        | avx2   | 0.088999 | 0.003113   | 28.59x  | -96.50%        |
| 1024x1024 | uint8   | 4        | 0.50    | overlay       | scalar | 0.113675 | 0.041214   | 2.76x   | -63.74%        |
| 1024x1024 | uint8   | 4        | 0.50    | overlay       | sse42  | 0.113675 | 0.003567   | 31.87x  | -96.86%        |
| 1024x1024 | uint8   | 4        | 0.50    | overlay       | avx2   | 0.113675 | 0.003233   | 35.16x  | -97.16%        |
| 1024x1024 | float32 | 3        | 0.50    | normal        | scalar | 0.086445 | 0.008002   | 10.80x  | -90.74%        |
| 1024x1024 | float32 | 3        | 0.50    | normal        | sse42  | 0.086445 | 0.003550   | 24.35x  | -95.89%        |
| 1024x1024 | float32 | 3        | 0.50    | normal        | avx2   | 0.086445 | 0.002582   | 33.48x  | -97.01%        |
| 1024x1024 | float32 | 3        | 0.50    | soft_light    | scalar | 0.122820 | 0.009932   | 12.37x  | -91.91%        |
| 1024x1024 | float32 | 3        | 0.50    | soft_light    | sse42  | 0.122820 | 0.002300   | 53.40x  | -98.13%        |
| 1024x1024 | float32 | 3        | 0.50    | soft_light    | avx2   | 0.122820 | 0.002739   | 44.84x  | -97.77%        |
| 1024x1024 | float32 | 3        | 0.50    | lighten_only  | scalar | 0.094402 | 0.012426   | 7.60x   | -86.84%        |
| 1024x1024 | float32 | 3        | 0.50    | lighten_only  | sse42  | 0.094402 | 0.002276   | 41.48x  | -97.59%        |
| 1024x1024 | float32 | 3        | 0.50    | lighten_only  | avx2   | 0.094402 | 0.002384   | 39.60x  | -97.47%        |
| 1024x1024 | float32 | 3        | 0.50    | screen        | scalar | 0.100466 | 0.009745   | 10.31x  | -90.30%        |
| 1024x1024 | float32 | 3        | 0.50    | screen        | sse42  | 0.100466 | 0.002531   | 39.70x  | -97.48%        |
| 1024x1024 | float32 | 3        | 0.50    | screen        | avx2   | 0.100466 | 0.002021   | 49.71x  | -97.99%        |
| 1024x1024 | float32 | 3        | 0.50    | dodge         | scalar | 0.098717 | 0.010352   | 9.54x   | -89.51%        |
| 1024x1024 | float32 | 3        | 0.50    | dodge         | sse42  | 0.098717 | 0.002690   | 36.69x  | -97.27%        |
| 1024x1024 | float32 | 3        | 0.50    | dodge         | avx2   | 0.098717 | 0.001954   | 50.53x  | -98.02%        |
| 1024x1024 | float32 | 3        | 0.50    | addition      | scalar | 0.095256 | 0.025251   | 3.77x   | -73.49%        |
| 1024x1024 | float32 | 3        | 0.50    | addition      | sse42  | 0.095256 | 0.002694   | 35.36x  | -97.17%        |
| 1024x1024 | float32 | 3        | 0.50    | addition      | avx2   | 0.095256 | 0.002082   | 45.75x  | -97.81%        |
| 1024x1024 | float32 | 3        | 0.50    | darken_only   | scalar | 0.094916 | 0.011969   | 7.93x   | -87.39%        |
| 1024x1024 | float32 | 3        | 0.50    | darken_only   | sse42  | 0.094916 | 0.002470   | 38.43x  | -97.40%        |
| 1024x1024 | float32 | 3        | 0.50    | darken_only   | avx2   | 0.094916 | 0.002102   | 45.15x  | -97.78%        |
| 1024x1024 | float32 | 3        | 0.50    | multiply      | scalar | 0.098350 | 0.009078   | 10.83x  | -90.77%        |
| 1024x1024 | float32 | 3        | 0.50    | multiply      | sse42  | 0.098350 | 0.002392   | 41.11x  | -97.57%        |
| 1024x1024 | float32 | 3        | 0.50    | multiply      | avx2   | 0.098350 | 0.002190   | 44.90x  | -97.77%        |
| 1024x1024 | float32 | 3        | 0.50    | hard_light    | scalar | 0.133216 | 0.028277   | 4.71x   | -78.77%        |
| 1024x1024 | float32 | 3        | 0.50    | hard_light    | sse42  | 0.133216 | 0.002367   | 56.29x  | -98.22%        |
| 1024x1024 | float32 | 3        | 0.50    | hard_light    | avx2   | 0.133216 | 0.001965   | 67.81x  | -98.53%        |
| 1024x1024 | float32 | 3        | 0.50    | difference    | scalar | 0.126263 | 0.009110   | 13.86x  | -92.79%        |
| 1024x1024 | float32 | 3        | 0.50    | difference    | sse42  | 0.126263 | 0.002695   | 46.84x  | -97.87%        |
| 1024x1024 | float32 | 3        | 0.50    | difference    | avx2   | 0.126263 | 0.002288   | 55.17x  | -98.19%        |
| 1024x1024 | float32 | 3        | 0.50    | subtract      | scalar | 0.094905 | 0.011105   | 8.55x   | -88.30%        |
| 1024x1024 | float32 | 3        | 0.50    | subtract      | sse42  | 0.094905 | 0.002321   | 40.88x  | -97.55%        |
| 1024x1024 | float32 | 3        | 0.50    | subtract      | avx2   | 0.094905 | 0.002361   | 40.20x  | -97.51%        |
| 1024x1024 | float32 | 3        | 0.50    | grain_extract | scalar | 0.100053 | 0.016390   | 6.10x   | -83.62%        |
| 1024x1024 | float32 | 3        | 0.50    | grain_extract | sse42  | 0.100053 | 0.002224   | 45.00x  | -97.78%        |
| 1024x1024 | float32 | 3        | 0.50    | grain_extract | avx2   | 0.100053 | 0.002142   | 46.70x  | -97.86%        |
| 1024x1024 | float32 | 3        | 0.50    | grain_merge   | scalar | 0.098992 | 0.016400   | 6.04x   | -83.43%        |
| 1024x1024 | float32 | 3        | 0.50    | grain_merge   | sse42  | 0.098992 | 0.002317   | 42.72x  | -97.66%        |
| 1024x1024 | float32 | 3        | 0.50    | grain_merge   | avx2   | 0.098992 | 0.002212   | 44.75x  | -97.77%        |
| 1024x1024 | float32 | 3        | 0.50    | divide        | scalar | 0.100818 | 0.009811   | 10.28x  | -90.27%        |
| 1024x1024 | float32 | 3        | 0.50    | divide        | sse42  | 0.100818 | 0.003033   | 33.24x  | -96.99%        |
| 1024x1024 | float32 | 3        | 0.50    | divide        | avx2   | 0.100818 | 0.002016   | 50.02x  | -98.00%        |
| 1024x1024 | float32 | 3        | 0.50    | overlay       | scalar | 0.125831 | 0.026059   | 4.83x   | -79.29%        |
| 1024x1024 | float32 | 3        | 0.50    | overlay       | sse42  | 0.125831 | 0.002280   | 55.18x  | -98.19%        |
| 1024x1024 | float32 | 3        | 0.50    | overlay       | avx2   | 0.125831 | 0.002278   | 55.25x  | -98.19%        |
| 1024x1024 | float32 | 4        | 0.50    | normal        | scalar | 0.066683 | 0.009760   | 6.83x   | -85.36%        |
| 1024x1024 | float32 | 4        | 0.50    | normal        | sse42  | 0.066683 | 0.003929   | 16.97x  | -94.11%        |
| 1024x1024 | float32 | 4        | 0.50    | normal        | avx2   | 0.066683 | 0.003399   | 19.62x  | -94.90%        |
| 1024x1024 | float32 | 4        | 0.50    | soft_light    | scalar | 0.100580 | 0.011439   | 8.79x   | -88.63%        |
| 1024x1024 | float32 | 4        | 0.50    | soft_light    | sse42  | 0.100580 | 0.003212   | 31.32x  | -96.81%        |
| 1024x1024 | float32 | 4        | 0.50    | soft_light    | avx2   | 0.100580 | 0.003309   | 30.39x  | -96.71%        |
| 1024x1024 | float32 | 4        | 0.50    | lighten_only  | scalar | 0.074485 | 0.012317   | 6.05x   | -83.46%        |
| 1024x1024 | float32 | 4        | 0.50    | lighten_only  | sse42  | 0.074485 | 0.003178   | 23.44x  | -95.73%        |
| 1024x1024 | float32 | 4        | 0.50    | lighten_only  | avx2   | 0.074485 | 0.003360   | 22.17x  | -95.49%        |
| 1024x1024 | float32 | 4        | 0.50    | screen        | scalar | 0.078071 | 0.010670   | 7.32x   | -86.33%        |
| 1024x1024 | float32 | 4        | 0.50    | screen        | sse42  | 0.078071 | 0.003070   | 25.43x  | -96.07%        |
| 1024x1024 | float32 | 4        | 0.50    | screen        | avx2   | 0.078071 | 0.003272   | 23.86x  | -95.81%        |
| 1024x1024 | float32 | 4        | 0.50    | dodge         | scalar | 0.080010 | 0.011849   | 6.75x   | -85.19%        |
| 1024x1024 | float32 | 4        | 0.50    | dodge         | sse42  | 0.080010 | 0.004136   | 19.34x  | -94.83%        |
| 1024x1024 | float32 | 4        | 0.50    | dodge         | avx2   | 0.080010 | 0.003098   | 25.82x  | -96.13%        |
| 1024x1024 | float32 | 4        | 0.50    | addition      | scalar | 0.075135 | 0.021201   | 3.54x   | -71.78%        |
| 1024x1024 | float32 | 4        | 0.50    | addition      | sse42  | 0.075135 | 0.003884   | 19.35x  | -94.83%        |
| 1024x1024 | float32 | 4        | 0.50    | addition      | avx2   | 0.075135 | 0.003122   | 24.07x  | -95.84%        |
| 1024x1024 | float32 | 4        | 0.50    | darken_only   | scalar | 0.073648 | 0.011826   | 6.23x   | -83.94%        |
| 1024x1024 | float32 | 4        | 0.50    | darken_only   | sse42  | 0.073648 | 0.002921   | 25.21x  | -96.03%        |
| 1024x1024 | float32 | 4        | 0.50    | darken_only   | avx2   | 0.073648 | 0.003337   | 22.07x  | -95.47%        |
| 1024x1024 | float32 | 4        | 0.50    | multiply      | scalar | 0.075282 | 0.011353   | 6.63x   | -84.92%        |
| 1024x1024 | float32 | 4        | 0.50    | multiply      | sse42  | 0.075282 | 0.002972   | 25.33x  | -96.05%        |
| 1024x1024 | float32 | 4        | 0.50    | multiply      | avx2   | 0.075282 | 0.004168   | 18.06x  | -94.46%        |
| 1024x1024 | float32 | 4        | 0.50    | hard_light    | scalar | 0.110691 | 0.029514   | 3.75x   | -73.34%        |
| 1024x1024 | float32 | 4        | 0.50    | hard_light    | sse42  | 0.110691 | 0.004131   | 26.80x  | -96.27%        |
| 1024x1024 | float32 | 4        | 0.50    | hard_light    | avx2   | 0.110691 | 0.003289   | 33.66x  | -97.03%        |
| 1024x1024 | float32 | 4        | 0.50    | difference    | scalar | 0.105439 | 0.010449   | 10.09x  | -90.09%        |
| 1024x1024 | float32 | 4        | 0.50    | difference    | sse42  | 0.105439 | 0.003800   | 27.75x  | -96.40%        |
| 1024x1024 | float32 | 4        | 0.50    | difference    | avx2   | 0.105439 | 0.003026   | 34.85x  | -97.13%        |
| 1024x1024 | float32 | 4        | 0.50    | subtract      | scalar | 0.073468 | 0.014309   | 5.13x   | -80.52%        |
| 1024x1024 | float32 | 4        | 0.50    | subtract      | sse42  | 0.073468 | 0.004948   | 14.85x  | -93.26%        |
| 1024x1024 | float32 | 4        | 0.50    | subtract      | avx2   | 0.073468 | 0.003341   | 21.99x  | -95.45%        |
| 1024x1024 | float32 | 4        | 0.50    | grain_extract | scalar | 0.077176 | 0.016960   | 4.55x   | -78.02%        |
| 1024x1024 | float32 | 4        | 0.50    | grain_extract | sse42  | 0.077176 | 0.003385   | 22.80x  | -95.61%        |
| 1024x1024 | float32 | 4        | 0.50    | grain_extract | avx2   | 0.077176 | 0.003099   | 24.90x  | -95.98%        |
| 1024x1024 | float32 | 4        | 0.50    | grain_merge   | scalar | 0.079242 | 0.016749   | 4.73x   | -78.86%        |
| 1024x1024 | float32 | 4        | 0.50    | grain_merge   | sse42  | 0.079242 | 0.003013   | 26.30x  | -96.20%        |
| 1024x1024 | float32 | 4        | 0.50    | grain_merge   | avx2   | 0.079242 | 0.003299   | 24.02x  | -95.84%        |
| 1024x1024 | float32 | 4        | 0.50    | divide        | scalar | 0.078134 | 0.011467   | 6.81x   | -85.32%        |
| 1024x1024 | float32 | 4        | 0.50    | divide        | sse42  | 0.078134 | 0.003922   | 19.92x  | -94.98%        |
| 1024x1024 | float32 | 4        | 0.50    | divide        | avx2   | 0.078134 | 0.003309   | 23.61x  | -95.76%        |
| 1024x1024 | float32 | 4        | 0.50    | overlay       | scalar | 0.105344 | 0.027706   | 3.80x   | -73.70%        |
| 1024x1024 | float32 | 4        | 0.50    | overlay       | sse42  | 0.105344 | 0.003309   | 31.84x  | -96.86%        |
| 1024x1024 | float32 | 4        | 0.50    | overlay       | avx2   | 0.105344 | 0.003394   | 31.03x  | -96.78%        |
| 2048x2048 | uint8   | 3        | 0.50    | normal        | scalar | 0.406478 | 0.103970   | 3.91x   | -74.42%        |
| 2048x2048 | uint8   | 3        | 0.50    | normal        | sse42  | 0.406478 | 0.043897   | 9.26x   | -89.20%        |
| 2048x2048 | uint8   | 3        | 0.50    | normal        | avx2   | 0.406478 | 0.044544   | 9.13x   | -89.04%        |
| 2048x2048 | uint8   | 3        | 0.50    | soft_light    | scalar | 0.513669 | 0.117205   | 4.38x   | -77.18%        |
| 2048x2048 | uint8   | 3        | 0.50    | soft_light    | sse42  | 0.513669 | 0.051394   | 9.99x   | -89.99%        |
| 2048x2048 | uint8   | 3        | 0.50    | soft_light    | avx2   | 0.513669 | 0.046029   | 11.16x  | -91.04%        |
| 2048x2048 | uint8   | 3        | 0.50    | lighten_only  | scalar | 0.397173 | 0.127509   | 3.11x   | -67.90%        |
| 2048x2048 | uint8   | 3        | 0.50    | lighten_only  | sse42  | 0.397173 | 0.050040   | 7.94x   | -87.40%        |
| 2048x2048 | uint8   | 3        | 0.50    | lighten_only  | avx2   | 0.397173 | 0.045307   | 8.77x   | -88.59%        |
| 2048x2048 | uint8   | 3        | 0.50    | screen        | scalar | 0.418428 | 0.115151   | 3.63x   | -72.48%        |
| 2048x2048 | uint8   | 3        | 0.50    | screen        | sse42  | 0.418428 | 0.050612   | 8.27x   | -87.90%        |
| 2048x2048 | uint8   | 3        | 0.50    | screen        | avx2   | 0.418428 | 0.046220   | 9.05x   | -88.95%        |
| 2048x2048 | uint8   | 3        | 0.50    | dodge         | scalar | 0.425751 | 0.122530   | 3.47x   | -71.22%        |
| 2048x2048 | uint8   | 3        | 0.50    | dodge         | sse42  | 0.425751 | 0.053781   | 7.92x   | -87.37%        |
| 2048x2048 | uint8   | 3        | 0.50    | dodge         | avx2   | 0.425751 | 0.047707   | 8.92x   | -88.79%        |
| 2048x2048 | uint8   | 3        | 0.50    | addition      | scalar | 0.406609 | 0.164555   | 2.47x   | -59.53%        |
| 2048x2048 | uint8   | 3        | 0.50    | addition      | sse42  | 0.406609 | 0.051311   | 7.92x   | -87.38%        |
| 2048x2048 | uint8   | 3        | 0.50    | addition      | avx2   | 0.406609 | 0.045488   | 8.94x   | -88.81%        |
| 2048x2048 | uint8   | 3        | 0.50    | darken_only   | scalar | 0.435759 | 0.135909   | 3.21x   | -68.81%        |
| 2048x2048 | uint8   | 3        | 0.50    | darken_only   | sse42  | 0.435759 | 0.052316   | 8.33x   | -87.99%        |
| 2048x2048 | uint8   | 3        | 0.50    | darken_only   | avx2   | 0.435759 | 0.047003   | 9.27x   | -89.21%        |
| 2048x2048 | uint8   | 3        | 0.50    | multiply      | scalar | 0.405098 | 0.116875   | 3.47x   | -71.15%        |
| 2048x2048 | uint8   | 3        | 0.50    | multiply      | sse42  | 0.405098 | 0.050465   | 8.03x   | -87.54%        |
| 2048x2048 | uint8   | 3        | 0.50    | multiply      | avx2   | 0.405098 | 0.046200   | 8.77x   | -88.60%        |
| 2048x2048 | uint8   | 3        | 0.50    | hard_light    | scalar | 0.573302 | 0.198384   | 2.89x   | -65.40%        |
| 2048x2048 | uint8   | 3        | 0.50    | hard_light    | sse42  | 0.573302 | 0.054644   | 10.49x  | -90.47%        |
| 2048x2048 | uint8   | 3        | 0.50    | hard_light    | avx2   | 0.573302 | 0.047799   | 11.99x  | -91.66%        |
| 2048x2048 | uint8   | 3        | 0.50    | difference    | scalar | 0.520069 | 0.115214   | 4.51x   | -77.85%        |
| 2048x2048 | uint8   | 3        | 0.50    | difference    | sse42  | 0.520069 | 0.050835   | 10.23x  | -90.23%        |
| 2048x2048 | uint8   | 3        | 0.50    | difference    | avx2   | 0.520069 | 0.046059   | 11.29x  | -91.14%        |
| 2048x2048 | uint8   | 3        | 0.50    | subtract      | scalar | 0.408854 | 0.109398   | 3.74x   | -73.24%        |
| 2048x2048 | uint8   | 3        | 0.50    | subtract      | sse42  | 0.408854 | 0.051950   | 7.87x   | -87.29%        |
| 2048x2048 | uint8   | 3        | 0.50    | subtract      | avx2   | 0.408854 | 0.047337   | 8.64x   | -88.42%        |
| 2048x2048 | uint8   | 3        | 0.50    | grain_extract | scalar | 0.420742 | 0.141463   | 2.97x   | -66.38%        |
| 2048x2048 | uint8   | 3        | 0.50    | grain_extract | sse42  | 0.420742 | 0.051435   | 8.18x   | -87.78%        |
| 2048x2048 | uint8   | 3        | 0.50    | grain_extract | avx2   | 0.420742 | 0.046147   | 9.12x   | -89.03%        |
| 2048x2048 | uint8   | 3        | 0.50    | grain_merge   | scalar | 0.420308 | 0.141966   | 2.96x   | -66.22%        |
| 2048x2048 | uint8   | 3        | 0.50    | grain_merge   | sse42  | 0.420308 | 0.051463   | 8.17x   | -87.76%        |
| 2048x2048 | uint8   | 3        | 0.50    | grain_merge   | avx2   | 0.420308 | 0.046231   | 9.09x   | -89.00%        |
| 2048x2048 | uint8   | 3        | 0.50    | divide        | scalar | 0.425666 | 0.119293   | 3.57x   | -71.97%        |
| 2048x2048 | uint8   | 3        | 0.50    | divide        | sse42  | 0.425666 | 0.052179   | 8.16x   | -87.74%        |
| 2048x2048 | uint8   | 3        | 0.50    | divide        | avx2   | 0.425666 | 0.046970   | 9.06x   | -88.97%        |
| 2048x2048 | uint8   | 3        | 0.50    | overlay       | scalar | 0.537216 | 0.192175   | 2.80x   | -64.23%        |
| 2048x2048 | uint8   | 3        | 0.50    | overlay       | sse42  | 0.537216 | 0.051778   | 10.38x  | -90.36%        |
| 2048x2048 | uint8   | 3        | 0.50    | overlay       | avx2   | 0.537216 | 0.047305   | 11.36x  | -91.19%        |
| 2048x2048 | uint8   | 4        | 0.50    | normal        | scalar | 0.297583 | 0.086940   | 3.42x   | -70.78%        |
| 2048x2048 | uint8   | 4        | 0.50    | normal        | sse42  | 0.297583 | 0.011501   | 25.87x  | -96.14%        |
| 2048x2048 | uint8   | 4        | 0.50    | normal        | avx2   | 0.297583 | 0.010174   | 29.25x  | -96.58%        |
| 2048x2048 | uint8   | 4        | 0.50    | soft_light    | scalar | 0.412936 | 0.107606   | 3.84x   | -73.94%        |
| 2048x2048 | uint8   | 4        | 0.50    | soft_light    | sse42  | 0.412936 | 0.014190   | 29.10x  | -96.56%        |
| 2048x2048 | uint8   | 4        | 0.50    | soft_light    | avx2   | 0.412936 | 0.013515   | 30.55x  | -96.73%        |
| 2048x2048 | uint8   | 4        | 0.50    | lighten_only  | scalar | 0.289737 | 0.112950   | 2.57x   | -61.02%        |
| 2048x2048 | uint8   | 4        | 0.50    | lighten_only  | sse42  | 0.289737 | 0.012784   | 22.66x  | -95.59%        |
| 2048x2048 | uint8   | 4        | 0.50    | lighten_only  | avx2   | 0.289737 | 0.012036   | 24.07x  | -95.85%        |
| 2048x2048 | uint8   | 4        | 0.50    | screen        | scalar | 0.312628 | 0.102053   | 3.06x   | -67.36%        |
| 2048x2048 | uint8   | 4        | 0.50    | screen        | sse42  | 0.312628 | 0.013528   | 23.11x  | -95.67%        |
| 2048x2048 | uint8   | 4        | 0.50    | screen        | avx2   | 0.312628 | 0.012819   | 24.39x  | -95.90%        |
| 2048x2048 | uint8   | 4        | 0.50    | dodge         | scalar | 0.318921 | 0.106110   | 3.01x   | -66.73%        |
| 2048x2048 | uint8   | 4        | 0.50    | dodge         | sse42  | 0.318921 | 0.014930   | 21.36x  | -95.32%        |
| 2048x2048 | uint8   | 4        | 0.50    | dodge         | avx2   | 0.318921 | 0.013018   | 24.50x  | -95.92%        |
| 2048x2048 | uint8   | 4        | 0.50    | addition      | scalar | 0.302044 | 0.129164   | 2.34x   | -57.24%        |
| 2048x2048 | uint8   | 4        | 0.50    | addition      | sse42  | 0.302044 | 0.016663   | 18.13x  | -94.48%        |
| 2048x2048 | uint8   | 4        | 0.50    | addition      | avx2   | 0.302044 | 0.012979   | 23.27x  | -95.70%        |
| 2048x2048 | uint8   | 4        | 0.50    | darken_only   | scalar | 0.285542 | 0.112388   | 2.54x   | -60.64%        |
| 2048x2048 | uint8   | 4        | 0.50    | darken_only   | sse42  | 0.285542 | 0.012517   | 22.81x  | -95.62%        |
| 2048x2048 | uint8   | 4        | 0.50    | darken_only   | avx2   | 0.285542 | 0.011979   | 23.84x  | -95.80%        |
| 2048x2048 | uint8   | 4        | 0.50    | multiply      | scalar | 0.300874 | 0.103462   | 2.91x   | -65.61%        |
| 2048x2048 | uint8   | 4        | 0.50    | multiply      | sse42  | 0.300874 | 0.012915   | 23.30x  | -95.71%        |
| 2048x2048 | uint8   | 4        | 0.50    | multiply      | avx2   | 0.300874 | 0.012093   | 24.88x  | -95.98%        |
| 2048x2048 | uint8   | 4        | 0.50    | hard_light    | scalar | 0.473728 | 0.168717   | 2.81x   | -64.39%        |
| 2048x2048 | uint8   | 4        | 0.50    | hard_light    | sse42  | 0.473728 | 0.015362   | 30.84x  | -96.76%        |
| 2048x2048 | uint8   | 4        | 0.50    | hard_light    | avx2   | 0.473728 | 0.012798   | 37.02x  | -97.30%        |
| 2048x2048 | uint8   | 4        | 0.50    | difference    | scalar | 0.410639 | 0.101625   | 4.04x   | -75.25%        |
| 2048x2048 | uint8   | 4        | 0.50    | difference    | sse42  | 0.410639 | 0.012701   | 32.33x  | -96.91%        |
| 2048x2048 | uint8   | 4        | 0.50    | difference    | avx2   | 0.410639 | 0.011930   | 34.42x  | -97.09%        |
| 2048x2048 | uint8   | 4        | 0.50    | subtract      | scalar | 0.301721 | 0.097767   | 3.09x   | -67.60%        |
| 2048x2048 | uint8   | 4        | 0.50    | subtract      | sse42  | 0.301721 | 0.016888   | 17.87x  | -94.40%        |
| 2048x2048 | uint8   | 4        | 0.50    | subtract      | avx2   | 0.301721 | 0.014122   | 21.37x  | -95.32%        |
| 2048x2048 | uint8   | 4        | 0.50    | grain_extract | scalar | 0.308857 | 0.123466   | 2.50x   | -60.02%        |
| 2048x2048 | uint8   | 4        | 0.50    | grain_extract | sse42  | 0.308857 | 0.013726   | 22.50x  | -95.56%        |
| 2048x2048 | uint8   | 4        | 0.50    | grain_extract | avx2   | 0.308857 | 0.012771   | 24.18x  | -95.87%        |
| 2048x2048 | uint8   | 4        | 0.50    | grain_merge   | scalar | 0.317239 | 0.122207   | 2.60x   | -61.48%        |
| 2048x2048 | uint8   | 4        | 0.50    | grain_merge   | sse42  | 0.317239 | 0.015527   | 20.43x  | -95.11%        |
| 2048x2048 | uint8   | 4        | 0.50    | grain_merge   | avx2   | 0.317239 | 0.012741   | 24.90x  | -95.98%        |
| 2048x2048 | uint8   | 4        | 0.50    | divide        | scalar | 0.322031 | 0.103194   | 3.12x   | -67.96%        |
| 2048x2048 | uint8   | 4        | 0.50    | divide        | sse42  | 0.322031 | 0.013990   | 23.02x  | -95.66%        |
| 2048x2048 | uint8   | 4        | 0.50    | divide        | avx2   | 0.322031 | 0.012525   | 25.71x  | -96.11%        |
| 2048x2048 | uint8   | 4        | 0.50    | overlay       | scalar | 0.427598 | 0.164553   | 2.60x   | -61.52%        |
| 2048x2048 | uint8   | 4        | 0.50    | overlay       | sse42  | 0.427598 | 0.014558   | 29.37x  | -96.60%        |
| 2048x2048 | uint8   | 4        | 0.50    | overlay       | avx2   | 0.427598 | 0.012801   | 33.40x  | -97.01%        |
| 2048x2048 | float32 | 3        | 0.50    | normal        | scalar | 0.348276 | 0.037925   | 9.18x   | -89.11%        |
| 2048x2048 | float32 | 3        | 0.50    | normal        | sse42  | 0.348276 | 0.020049   | 17.37x  | -94.24%        |
| 2048x2048 | float32 | 3        | 0.50    | normal        | avx2   | 0.348276 | 0.016562   | 21.03x  | -95.24%        |
| 2048x2048 | float32 | 3        | 0.50    | soft_light    | scalar | 0.470530 | 0.047570   | 9.89x   | -89.89%        |
| 2048x2048 | float32 | 3        | 0.50    | soft_light    | sse42  | 0.470530 | 0.015246   | 30.86x  | -96.76%        |
| 2048x2048 | float32 | 3        | 0.50    | soft_light    | avx2   | 0.470530 | 0.013125   | 35.85x  | -97.21%        |
| 2048x2048 | float32 | 3        | 0.50    | lighten_only  | scalar | 0.344228 | 0.053729   | 6.41x   | -84.39%        |
| 2048x2048 | float32 | 3        | 0.50    | lighten_only  | sse42  | 0.344228 | 0.015324   | 22.46x  | -95.55%        |
| 2048x2048 | float32 | 3        | 0.50    | lighten_only  | avx2   | 0.344228 | 0.012691   | 27.12x  | -96.31%        |
| 2048x2048 | float32 | 3        | 0.50    | screen        | scalar | 0.368997 | 0.044315   | 8.33x   | -87.99%        |
| 2048x2048 | float32 | 3        | 0.50    | screen        | sse42  | 0.368997 | 0.015146   | 24.36x  | -95.90%        |
| 2048x2048 | float32 | 3        | 0.50    | screen        | avx2   | 0.368997 | 0.014015   | 26.33x  | -96.20%        |
| 2048x2048 | float32 | 3        | 0.50    | dodge         | scalar | 0.372967 | 0.047910   | 7.78x   | -87.15%        |
| 2048x2048 | float32 | 3        | 0.50    | dodge         | sse42  | 0.372967 | 0.015461   | 24.12x  | -95.85%        |
| 2048x2048 | float32 | 3        | 0.50    | dodge         | avx2   | 0.372967 | 0.013196   | 28.26x  | -96.46%        |
| 2048x2048 | float32 | 3        | 0.50    | addition      | scalar | 0.360312 | 0.107842   | 3.34x   | -70.07%        |
| 2048x2048 | float32 | 3        | 0.50    | addition      | sse42  | 0.360312 | 0.014394   | 25.03x  | -96.01%        |
| 2048x2048 | float32 | 3        | 0.50    | addition      | avx2   | 0.360312 | 0.013128   | 27.45x  | -96.36%        |
| 2048x2048 | float32 | 3        | 0.50    | darken_only   | scalar | 0.344377 | 0.054963   | 6.27x   | -84.04%        |
| 2048x2048 | float32 | 3        | 0.50    | darken_only   | sse42  | 0.344377 | 0.014703   | 23.42x  | -95.73%        |
| 2048x2048 | float32 | 3        | 0.50    | darken_only   | avx2   | 0.344377 | 0.013112   | 26.26x  | -96.19%        |
| 2048x2048 | float32 | 3        | 0.50    | multiply      | scalar | 0.357180 | 0.043165   | 8.27x   | -87.91%        |
| 2048x2048 | float32 | 3        | 0.50    | multiply      | sse42  | 0.357180 | 0.014494   | 24.64x  | -95.94%        |
| 2048x2048 | float32 | 3        | 0.50    | multiply      | avx2   | 0.357180 | 0.014012   | 25.49x  | -96.08%        |
| 2048x2048 | float32 | 3        | 0.50    | hard_light    | scalar | 0.521354 | 0.118265   | 4.41x   | -77.32%        |
| 2048x2048 | float32 | 3        | 0.50    | hard_light    | sse42  | 0.521354 | 0.015081   | 34.57x  | -97.11%        |
| 2048x2048 | float32 | 3        | 0.50    | hard_light    | avx2   | 0.521354 | 0.013849   | 37.65x  | -97.34%        |
| 2048x2048 | float32 | 3        | 0.50    | difference    | scalar | 0.462161 | 0.042822   | 10.79x  | -90.73%        |
| 2048x2048 | float32 | 3        | 0.50    | difference    | sse42  | 0.462161 | 0.015633   | 29.56x  | -96.62%        |
| 2048x2048 | float32 | 3        | 0.50    | difference    | avx2   | 0.462161 | 0.013165   | 35.11x  | -97.15%        |
| 2048x2048 | float32 | 3        | 0.50    | subtract      | scalar | 0.349363 | 0.049832   | 7.01x   | -85.74%        |
| 2048x2048 | float32 | 3        | 0.50    | subtract      | sse42  | 0.349363 | 0.014744   | 23.69x  | -95.78%        |
| 2048x2048 | float32 | 3        | 0.50    | subtract      | avx2   | 0.349363 | 0.013557   | 25.77x  | -96.12%        |
| 2048x2048 | float32 | 3        | 0.50    | grain_extract | scalar | 0.363105 | 0.070047   | 5.18x   | -80.71%        |
| 2048x2048 | float32 | 3        | 0.50    | grain_extract | sse42  | 0.363105 | 0.015537   | 23.37x  | -95.72%        |
| 2048x2048 | float32 | 3        | 0.50    | grain_extract | avx2   | 0.363105 | 0.012680   | 28.64x  | -96.51%        |
| 2048x2048 | float32 | 3        | 0.50    | grain_merge   | scalar | 0.363368 | 0.071229   | 5.10x   | -80.40%        |
| 2048x2048 | float32 | 3        | 0.50    | grain_merge   | sse42  | 0.363368 | 0.014281   | 25.44x  | -96.07%        |
| 2048x2048 | float32 | 3        | 0.50    | grain_merge   | avx2   | 0.363368 | 0.013089   | 27.76x  | -96.40%        |
| 2048x2048 | float32 | 3        | 0.50    | divide        | scalar | 0.369521 | 0.045268   | 8.16x   | -87.75%        |
| 2048x2048 | float32 | 3        | 0.50    | divide        | sse42  | 0.369521 | 0.015003   | 24.63x  | -95.94%        |
| 2048x2048 | float32 | 3        | 0.50    | divide        | avx2   | 0.369521 | 0.013443   | 27.49x  | -96.36%        |
| 2048x2048 | float32 | 3        | 0.50    | overlay       | scalar | 0.481747 | 0.112332   | 4.29x   | -76.68%        |
| 2048x2048 | float32 | 3        | 0.50    | overlay       | sse42  | 0.481747 | 0.015215   | 31.66x  | -96.84%        |
| 2048x2048 | float32 | 3        | 0.50    | overlay       | avx2   | 0.481747 | 0.013394   | 35.97x  | -97.22%        |
| 2048x2048 | float32 | 4        | 0.50    | normal        | scalar | 0.275003 | 0.046923   | 5.86x   | -82.94%        |
| 2048x2048 | float32 | 4        | 0.50    | normal        | sse42  | 0.275003 | 0.019295   | 14.25x  | -92.98%        |
| 2048x2048 | float32 | 4        | 0.50    | normal        | avx2   | 0.275003 | 0.022401   | 12.28x  | -91.85%        |
| 2048x2048 | float32 | 4        | 0.50    | soft_light    | scalar | 0.389409 | 0.055554   | 7.01x   | -85.73%        |
| 2048x2048 | float32 | 4        | 0.50    | soft_light    | sse42  | 0.389409 | 0.020611   | 18.89x  | -94.71%        |
| 2048x2048 | float32 | 4        | 0.50    | soft_light    | avx2   | 0.389409 | 0.021134   | 18.43x  | -94.57%        |
| 2048x2048 | float32 | 4        | 0.50    | lighten_only  | scalar | 0.272940 | 0.060844   | 4.49x   | -77.71%        |
| 2048x2048 | float32 | 4        | 0.50    | lighten_only  | sse42  | 0.272940 | 0.021280   | 12.83x  | -92.20%        |
| 2048x2048 | float32 | 4        | 0.50    | lighten_only  | avx2   | 0.272940 | 0.021537   | 12.67x  | -92.11%        |
| 2048x2048 | float32 | 4        | 0.50    | screen        | scalar | 0.295749 | 0.054038   | 5.47x   | -81.73%        |
| 2048x2048 | float32 | 4        | 0.50    | screen        | sse42  | 0.295749 | 0.022065   | 13.40x  | -92.54%        |
| 2048x2048 | float32 | 4        | 0.50    | screen        | avx2   | 0.295749 | 0.021104   | 14.01x  | -92.86%        |
| 2048x2048 | float32 | 4        | 0.50    | dodge         | scalar | 0.326736 | 0.061882   | 5.28x   | -81.06%        |
| 2048x2048 | float32 | 4        | 0.50    | dodge         | sse42  | 0.326736 | 0.025908   | 12.61x  | -92.07%        |
| 2048x2048 | float32 | 4        | 0.50    | dodge         | avx2   | 0.326736 | 0.019418   | 16.83x  | -94.06%        |
| 2048x2048 | float32 | 4        | 0.50    | addition      | scalar | 0.306088 | 0.095263   | 3.21x   | -68.88%        |
| 2048x2048 | float32 | 4        | 0.50    | addition      | sse42  | 0.306088 | 0.020833   | 14.69x  | -93.19%        |
| 2048x2048 | float32 | 4        | 0.50    | addition      | avx2   | 0.306088 | 0.021602   | 14.17x  | -92.94%        |
| 2048x2048 | float32 | 4        | 0.50    | darken_only   | scalar | 0.261519 | 0.058822   | 4.45x   | -77.51%        |
| 2048x2048 | float32 | 4        | 0.50    | darken_only   | sse42  | 0.261519 | 0.019674   | 13.29x  | -92.48%        |
| 2048x2048 | float32 | 4        | 0.50    | darken_only   | avx2   | 0.261519 | 0.019148   | 13.66x  | -92.68%        |
| 2048x2048 | float32 | 4        | 0.50    | multiply      | scalar | 0.263232 | 0.050738   | 5.19x   | -80.72%        |
| 2048x2048 | float32 | 4        | 0.50    | multiply      | sse42  | 0.263232 | 0.019529   | 13.48x  | -92.58%        |
| 2048x2048 | float32 | 4        | 0.50    | multiply      | avx2   | 0.263232 | 0.018913   | 13.92x  | -92.82%        |
| 2048x2048 | float32 | 4        | 0.50    | hard_light    | scalar | 0.426266 | 0.129999   | 3.28x   | -69.50%        |
| 2048x2048 | float32 | 4        | 0.50    | hard_light    | sse42  | 0.426266 | 0.021966   | 19.41x  | -94.85%        |
| 2048x2048 | float32 | 4        | 0.50    | hard_light    | avx2   | 0.426266 | 0.020397   | 20.90x  | -95.21%        |
| 2048x2048 | float32 | 4        | 0.50    | difference    | scalar | 0.371404 | 0.051607   | 7.20x   | -86.10%        |
| 2048x2048 | float32 | 4        | 0.50    | difference    | sse42  | 0.371404 | 0.019238   | 19.31x  | -94.82%        |
| 2048x2048 | float32 | 4        | 0.50    | difference    | avx2   | 0.371404 | 0.019218   | 19.33x  | -94.83%        |
| 2048x2048 | float32 | 4        | 0.50    | subtract      | scalar | 0.284699 | 0.068146   | 4.18x   | -76.06%        |
| 2048x2048 | float32 | 4        | 0.50    | subtract      | sse42  | 0.284699 | 0.025086   | 11.35x  | -91.19%        |
| 2048x2048 | float32 | 4        | 0.50    | subtract      | avx2   | 0.284699 | 0.019837   | 14.35x  | -93.03%        |
| 2048x2048 | float32 | 4        | 0.50    | grain_extract | scalar | 0.275522 | 0.078133   | 3.53x   | -71.64%        |
| 2048x2048 | float32 | 4        | 0.50    | grain_extract | sse42  | 0.275522 | 0.021660   | 12.72x  | -92.14%        |
| 2048x2048 | float32 | 4        | 0.50    | grain_extract | avx2   | 0.275522 | 0.019148   | 14.39x  | -93.05%        |
| 2048x2048 | float32 | 4        | 0.50    | grain_merge   | scalar | 0.300500 | 0.079173   | 3.80x   | -73.65%        |
| 2048x2048 | float32 | 4        | 0.50    | grain_merge   | sse42  | 0.300500 | 0.021176   | 14.19x  | -92.95%        |
| 2048x2048 | float32 | 4        | 0.50    | grain_merge   | avx2   | 0.300500 | 0.019447   | 15.45x  | -93.53%        |
| 2048x2048 | float32 | 4        | 0.50    | divide        | scalar | 0.316571 | 0.054351   | 5.82x   | -82.83%        |
| 2048x2048 | float32 | 4        | 0.50    | divide        | sse42  | 0.316571 | 0.021725   | 14.57x  | -93.14%        |
| 2048x2048 | float32 | 4        | 0.50    | divide        | avx2   | 0.316571 | 0.025873   | 12.24x  | -91.83%        |
| 2048x2048 | float32 | 4        | 0.50    | overlay       | scalar | 0.404299 | 0.121638   | 3.32x   | -69.91%        |
| 2048x2048 | float32 | 4        | 0.50    | overlay       | sse42  | 0.404299 | 0.021309   | 18.97x  | -94.73%        |
| 2048x2048 | float32 | 4        | 0.50    | overlay       | avx2   | 0.404299 | 0.018974   | 21.31x  | -95.31%        |
| 1280x720  | uint8   | 3        | 0.50    | normal        | scalar | 0.081749 | 0.023018   | 3.55x   | -71.84%        |
| 1280x720  | uint8   | 3        | 0.50    | normal        | sse42  | 0.081749 | 0.010117   | 8.08x   | -87.62%        |
| 1280x720  | uint8   | 3        | 0.50    | normal        | avx2   | 0.081749 | 0.009926   | 8.24x   | -87.86%        |
| 1280x720  | uint8   | 3        | 0.50    | soft_light    | scalar | 0.110683 | 0.025252   | 4.38x   | -77.19%        |
| 1280x720  | uint8   | 3        | 0.50    | soft_light    | sse42  | 0.110683 | 0.011234   | 9.85x   | -89.85%        |
| 1280x720  | uint8   | 3        | 0.50    | soft_light    | avx2   | 0.110683 | 0.010497   | 10.54x  | -90.52%        |
| 1280x720  | uint8   | 3        | 0.50    | lighten_only  | scalar | 0.085672 | 0.027879   | 3.07x   | -67.46%        |
| 1280x720  | uint8   | 3        | 0.50    | lighten_only  | sse42  | 0.085672 | 0.011036   | 7.76x   | -87.12%        |
| 1280x720  | uint8   | 3        | 0.50    | lighten_only  | avx2   | 0.085672 | 0.009823   | 8.72x   | -88.53%        |
| 1280x720  | uint8   | 3        | 0.50    | screen        | scalar | 0.089182 | 0.025188   | 3.54x   | -71.76%        |
| 1280x720  | uint8   | 3        | 0.50    | screen        | sse42  | 0.089182 | 0.010950   | 8.14x   | -87.72%        |
| 1280x720  | uint8   | 3        | 0.50    | screen        | avx2   | 0.089182 | 0.009958   | 8.96x   | -88.83%        |
| 1280x720  | uint8   | 3        | 0.50    | dodge         | scalar | 0.092472 | 0.025659   | 3.60x   | -72.25%        |
| 1280x720  | uint8   | 3        | 0.50    | dodge         | sse42  | 0.092472 | 0.011637   | 7.95x   | -87.42%        |
| 1280x720  | uint8   | 3        | 0.50    | dodge         | avx2   | 0.092472 | 0.010319   | 8.96x   | -88.84%        |
| 1280x720  | uint8   | 3        | 0.50    | addition      | scalar | 0.089698 | 0.035584   | 2.52x   | -60.33%        |
| 1280x720  | uint8   | 3        | 0.50    | addition      | sse42  | 0.089698 | 0.011950   | 7.51x   | -86.68%        |
| 1280x720  | uint8   | 3        | 0.50    | addition      | avx2   | 0.089698 | 0.009804   | 9.15x   | -89.07%        |
| 1280x720  | uint8   | 3        | 0.50    | darken_only   | scalar | 0.086171 | 0.027656   | 3.12x   | -67.91%        |
| 1280x720  | uint8   | 3        | 0.50    | darken_only   | sse42  | 0.086171 | 0.010974   | 7.85x   | -87.26%        |
| 1280x720  | uint8   | 3        | 0.50    | darken_only   | avx2   | 0.086171 | 0.009713   | 8.87x   | -88.73%        |
| 1280x720  | uint8   | 3        | 0.50    | multiply      | scalar | 0.086433 | 0.025351   | 3.41x   | -70.67%        |
| 1280x720  | uint8   | 3        | 0.50    | multiply      | sse42  | 0.086433 | 0.010940   | 7.90x   | -87.34%        |
| 1280x720  | uint8   | 3        | 0.50    | multiply      | avx2   | 0.086433 | 0.009801   | 8.82x   | -88.66%        |
| 1280x720  | uint8   | 3        | 0.50    | hard_light    | scalar | 0.117744 | 0.042659   | 2.76x   | -63.77%        |
| 1280x720  | uint8   | 3        | 0.50    | hard_light    | sse42  | 0.117744 | 0.011337   | 10.39x  | -90.37%        |
| 1280x720  | uint8   | 3        | 0.50    | hard_light    | avx2   | 0.117744 | 0.010118   | 11.64x  | -91.41%        |
| 1280x720  | uint8   | 3        | 0.50    | difference    | scalar | 0.113459 | 0.024709   | 4.59x   | -78.22%        |
| 1280x720  | uint8   | 3        | 0.50    | difference    | sse42  | 0.113459 | 0.011064   | 10.25x  | -90.25%        |
| 1280x720  | uint8   | 3        | 0.50    | difference    | avx2   | 0.113459 | 0.009944   | 11.41x  | -91.24%        |
| 1280x720  | uint8   | 3        | 0.50    | subtract      | scalar | 0.086341 | 0.024151   | 3.58x   | -72.03%        |
| 1280x720  | uint8   | 3        | 0.50    | subtract      | sse42  | 0.086341 | 0.011478   | 7.52x   | -86.71%        |
| 1280x720  | uint8   | 3        | 0.50    | subtract      | avx2   | 0.086341 | 0.010172   | 8.49x   | -88.22%        |
| 1280x720  | uint8   | 3        | 0.50    | grain_extract | scalar | 0.094821 | 0.030084   | 3.15x   | -68.27%        |
| 1280x720  | uint8   | 3        | 0.50    | grain_extract | sse42  | 0.094821 | 0.011110   | 8.53x   | -88.28%        |
| 1280x720  | uint8   | 3        | 0.50    | grain_extract | avx2   | 0.094821 | 0.010734   | 8.83x   | -88.68%        |
| 1280x720  | uint8   | 3        | 0.50    | grain_merge   | scalar | 0.096317 | 0.031312   | 3.08x   | -67.49%        |
| 1280x720  | uint8   | 3        | 0.50    | grain_merge   | sse42  | 0.096317 | 0.011382   | 8.46x   | -88.18%        |
| 1280x720  | uint8   | 3        | 0.50    | grain_merge   | avx2   | 0.096317 | 0.009941   | 9.69x   | -89.68%        |
| 1280x720  | uint8   | 3        | 0.50    | divide        | scalar | 0.088319 | 0.026086   | 3.39x   | -70.46%        |
| 1280x720  | uint8   | 3        | 0.50    | divide        | sse42  | 0.088319 | 0.011234   | 7.86x   | -87.28%        |
| 1280x720  | uint8   | 3        | 0.50    | divide        | avx2   | 0.088319 | 0.010112   | 8.73x   | -88.55%        |
| 1280x720  | uint8   | 3        | 0.50    | overlay       | scalar | 0.113392 | 0.042609   | 2.66x   | -62.42%        |
| 1280x720  | uint8   | 3        | 0.50    | overlay       | sse42  | 0.113392 | 0.011410   | 9.94x   | -89.94%        |
| 1280x720  | uint8   | 3        | 0.50    | overlay       | avx2   | 0.113392 | 0.010066   | 11.26x  | -91.12%        |
| 1280x720  | uint8   | 4        | 0.50    | normal        | scalar | 0.062221 | 0.018320   | 3.40x   | -70.56%        |
| 1280x720  | uint8   | 4        | 0.50    | normal        | sse42  | 0.062221 | 0.002638   | 23.59x  | -95.76%        |
| 1280x720  | uint8   | 4        | 0.50    | normal        | avx2   | 0.062221 | 0.002229   | 27.91x  | -96.42%        |
| 1280x720  | uint8   | 4        | 0.50    | soft_light    | scalar | 0.095106 | 0.023730   | 4.01x   | -75.05%        |
| 1280x720  | uint8   | 4        | 0.50    | soft_light    | sse42  | 0.095106 | 0.003153   | 30.17x  | -96.69%        |
| 1280x720  | uint8   | 4        | 0.50    | soft_light    | avx2   | 0.095106 | 0.002939   | 32.36x  | -96.91%        |
| 1280x720  | uint8   | 4        | 0.50    | lighten_only  | scalar | 0.085514 | 0.024037   | 3.56x   | -71.89%        |
| 1280x720  | uint8   | 4        | 0.50    | lighten_only  | sse42  | 0.085514 | 0.002844   | 30.07x  | -96.67%        |
| 1280x720  | uint8   | 4        | 0.50    | lighten_only  | avx2   | 0.085514 | 0.002838   | 30.13x  | -96.68%        |
| 1280x720  | uint8   | 4        | 0.50    | screen        | scalar | 0.076636 | 0.022177   | 3.46x   | -71.06%        |
| 1280x720  | uint8   | 4        | 0.50    | screen        | sse42  | 0.076636 | 0.002983   | 25.69x  | -96.11%        |
| 1280x720  | uint8   | 4        | 0.50    | screen        | avx2   | 0.076636 | 0.002747   | 27.90x  | -96.42%        |
| 1280x720  | uint8   | 4        | 0.50    | dodge         | scalar | 0.075983 | 0.023108   | 3.29x   | -69.59%        |
| 1280x720  | uint8   | 4        | 0.50    | dodge         | sse42  | 0.075983 | 0.003315   | 22.92x  | -95.64%        |
| 1280x720  | uint8   | 4        | 0.50    | dodge         | avx2   | 0.075983 | 0.002878   | 26.40x  | -96.21%        |
| 1280x720  | uint8   | 4        | 0.50    | addition      | scalar | 0.071151 | 0.028536   | 2.49x   | -59.89%        |
| 1280x720  | uint8   | 4        | 0.50    | addition      | sse42  | 0.071151 | 0.003697   | 19.24x  | -94.80%        |
| 1280x720  | uint8   | 4        | 0.50    | addition      | avx2   | 0.071151 | 0.002741   | 25.96x  | -96.15%        |
| 1280x720  | uint8   | 4        | 0.50    | darken_only   | scalar | 0.072488 | 0.024116   | 3.01x   | -66.73%        |
| 1280x720  | uint8   | 4        | 0.50    | darken_only   | sse42  | 0.072488 | 0.002973   | 24.38x  | -95.90%        |
| 1280x720  | uint8   | 4        | 0.50    | darken_only   | avx2   | 0.072488 | 0.002637   | 27.49x  | -96.36%        |
| 1280x720  | uint8   | 4        | 0.50    | multiply      | scalar | 0.073264 | 0.022598   | 3.24x   | -69.16%        |
| 1280x720  | uint8   | 4        | 0.50    | multiply      | sse42  | 0.073264 | 0.002925   | 25.05x  | -96.01%        |
| 1280x720  | uint8   | 4        | 0.50    | multiply      | avx2   | 0.073264 | 0.002682   | 27.31x  | -96.34%        |
| 1280x720  | uint8   | 4        | 0.50    | hard_light    | scalar | 0.105310 | 0.037034   | 2.84x   | -64.83%        |
| 1280x720  | uint8   | 4        | 0.50    | hard_light    | sse42  | 0.105310 | 0.003358   | 31.36x  | -96.81%        |
| 1280x720  | uint8   | 4        | 0.50    | hard_light    | avx2   | 0.105310 | 0.002804   | 37.56x  | -97.34%        |
| 1280x720  | uint8   | 4        | 0.50    | difference    | scalar | 0.100246 | 0.022411   | 4.47x   | -77.64%        |
| 1280x720  | uint8   | 4        | 0.50    | difference    | sse42  | 0.100246 | 0.002800   | 35.80x  | -97.21%        |
| 1280x720  | uint8   | 4        | 0.50    | difference    | avx2   | 0.100246 | 0.002619   | 38.28x  | -97.39%        |
| 1280x720  | uint8   | 4        | 0.50    | subtract      | scalar | 0.071455 | 0.021163   | 3.38x   | -70.38%        |
| 1280x720  | uint8   | 4        | 0.50    | subtract      | sse42  | 0.071455 | 0.003734   | 19.14x  | -94.77%        |
| 1280x720  | uint8   | 4        | 0.50    | subtract      | avx2   | 0.071455 | 0.003102   | 23.03x  | -95.66%        |
| 1280x720  | uint8   | 4        | 0.50    | grain_extract | scalar | 0.074815 | 0.026718   | 2.80x   | -64.29%        |
| 1280x720  | uint8   | 4        | 0.50    | grain_extract | sse42  | 0.074815 | 0.003028   | 24.71x  | -95.95%        |
| 1280x720  | uint8   | 4        | 0.50    | grain_extract | avx2   | 0.074815 | 0.003000   | 24.93x  | -95.99%        |
| 1280x720  | uint8   | 4        | 0.50    | grain_merge   | scalar | 0.079102 | 0.027056   | 2.92x   | -65.80%        |
| 1280x720  | uint8   | 4        | 0.50    | grain_merge   | sse42  | 0.079102 | 0.003009   | 26.29x  | -96.20%        |
| 1280x720  | uint8   | 4        | 0.50    | grain_merge   | avx2   | 0.079102 | 0.002805   | 28.20x  | -96.45%        |
| 1280x720  | uint8   | 4        | 0.50    | divide        | scalar | 0.076184 | 0.023375   | 3.26x   | -69.32%        |
| 1280x720  | uint8   | 4        | 0.50    | divide        | sse42  | 0.076184 | 0.003078   | 24.75x  | -95.96%        |
| 1280x720  | uint8   | 4        | 0.50    | divide        | avx2   | 0.076184 | 0.002775   | 27.45x  | -96.36%        |
| 1280x720  | uint8   | 4        | 0.50    | overlay       | scalar | 0.101832 | 0.036190   | 2.81x   | -64.46%        |
| 1280x720  | uint8   | 4        | 0.50    | overlay       | sse42  | 0.101832 | 0.003189   | 31.94x  | -96.87%        |
| 1280x720  | uint8   | 4        | 0.50    | overlay       | avx2   | 0.101832 | 0.002876   | 35.41x  | -97.18%        |
| 1280x720  | float32 | 3        | 0.50    | normal        | scalar | 0.070117 | 0.007062   | 9.93x   | -89.93%        |
| 1280x720  | float32 | 3        | 0.50    | normal        | sse42  | 0.070117 | 0.003130   | 22.40x  | -95.54%        |
| 1280x720  | float32 | 3        | 0.50    | normal        | avx2   | 0.070117 | 0.002121   | 33.06x  | -96.98%        |
| 1280x720  | float32 | 3        | 0.50    | soft_light    | scalar | 0.101852 | 0.009224   | 11.04x  | -90.94%        |
| 1280x720  | float32 | 3        | 0.50    | soft_light    | sse42  | 0.101852 | 0.001915   | 53.19x  | -98.12%        |
| 1280x720  | float32 | 3        | 0.50    | soft_light    | avx2   | 0.101852 | 0.001653   | 61.60x  | -98.38%        |
| 1280x720  | float32 | 3        | 0.50    | lighten_only  | scalar | 0.078152 | 0.010617   | 7.36x   | -86.41%        |
| 1280x720  | float32 | 3        | 0.50    | lighten_only  | sse42  | 0.078152 | 0.001840   | 42.46x  | -97.65%        |
| 1280x720  | float32 | 3        | 0.50    | lighten_only  | avx2   | 0.078152 | 0.001643   | 47.56x  | -97.90%        |
| 1280x720  | float32 | 3        | 0.50    | screen        | scalar | 0.082007 | 0.008041   | 10.20x  | -90.19%        |
| 1280x720  | float32 | 3        | 0.50    | screen        | sse42  | 0.082007 | 0.001868   | 43.89x  | -97.72%        |
| 1280x720  | float32 | 3        | 0.50    | screen        | avx2   | 0.082007 | 0.001593   | 51.49x  | -98.06%        |
| 1280x720  | float32 | 3        | 0.50    | dodge         | scalar | 0.081379 | 0.008973   | 9.07x   | -88.97%        |
| 1280x720  | float32 | 3        | 0.50    | dodge         | sse42  | 0.081379 | 0.002000   | 40.68x  | -97.54%        |
| 1280x720  | float32 | 3        | 0.50    | dodge         | avx2   | 0.081379 | 0.001675   | 48.57x  | -97.94%        |
| 1280x720  | float32 | 3        | 0.50    | addition      | scalar | 0.077101 | 0.022169   | 3.48x   | -71.25%        |
| 1280x720  | float32 | 3        | 0.50    | addition      | sse42  | 0.077101 | 0.001776   | 43.42x  | -97.70%        |
| 1280x720  | float32 | 3        | 0.50    | addition      | avx2   | 0.077101 | 0.001613   | 47.81x  | -97.91%        |
| 1280x720  | float32 | 3        | 0.50    | darken_only   | scalar | 0.078049 | 0.010903   | 7.16x   | -86.03%        |
| 1280x720  | float32 | 3        | 0.50    | darken_only   | sse42  | 0.078049 | 0.001909   | 40.88x  | -97.55%        |
| 1280x720  | float32 | 3        | 0.50    | darken_only   | avx2   | 0.078049 | 0.001716   | 45.49x  | -97.80%        |
| 1280x720  | float32 | 3        | 0.50    | multiply      | scalar | 0.079955 | 0.007971   | 10.03x  | -90.03%        |
| 1280x720  | float32 | 3        | 0.50    | multiply      | sse42  | 0.079955 | 0.001774   | 45.08x  | -97.78%        |
| 1280x720  | float32 | 3        | 0.50    | multiply      | avx2   | 0.079955 | 0.001703   | 46.94x  | -97.87%        |
| 1280x720  | float32 | 3        | 0.50    | hard_light    | scalar | 0.110026 | 0.025044   | 4.39x   | -77.24%        |
| 1280x720  | float32 | 3        | 0.50    | hard_light    | sse42  | 0.110026 | 0.001936   | 56.84x  | -98.24%        |
| 1280x720  | float32 | 3        | 0.50    | hard_light    | avx2   | 0.110026 | 0.001641   | 67.05x  | -98.51%        |
| 1280x720  | float32 | 3        | 0.50    | difference    | scalar | 0.104242 | 0.008069   | 12.92x  | -92.26%        |
| 1280x720  | float32 | 3        | 0.50    | difference    | sse42  | 0.104242 | 0.002728   | 38.21x  | -97.38%        |
| 1280x720  | float32 | 3        | 0.50    | difference    | avx2   | 0.104242 | 0.001603   | 65.02x  | -98.46%        |
| 1280x720  | float32 | 3        | 0.50    | subtract      | scalar | 0.076547 | 0.009633   | 7.95x   | -87.42%        |
| 1280x720  | float32 | 3        | 0.50    | subtract      | sse42  | 0.076547 | 0.001902   | 40.25x  | -97.52%        |
| 1280x720  | float32 | 3        | 0.50    | subtract      | avx2   | 0.076547 | 0.001637   | 46.75x  | -97.86%        |
| 1280x720  | float32 | 3        | 0.50    | grain_extract | scalar | 0.079671 | 0.014299   | 5.57x   | -82.05%        |
| 1280x720  | float32 | 3        | 0.50    | grain_extract | sse42  | 0.079671 | 0.001794   | 44.41x  | -97.75%        |
| 1280x720  | float32 | 3        | 0.50    | grain_extract | avx2   | 0.079671 | 0.001630   | 48.87x  | -97.95%        |
| 1280x720  | float32 | 3        | 0.50    | grain_merge   | scalar | 0.080732 | 0.014337   | 5.63x   | -82.24%        |
| 1280x720  | float32 | 3        | 0.50    | grain_merge   | sse42  | 0.080732 | 0.001810   | 44.61x  | -97.76%        |
| 1280x720  | float32 | 3        | 0.50    | grain_merge   | avx2   | 0.080732 | 0.001637   | 49.31x  | -97.97%        |
| 1280x720  | float32 | 3        | 0.50    | divide        | scalar | 0.081531 | 0.009671   | 8.43x   | -88.14%        |
| 1280x720  | float32 | 3        | 0.50    | divide        | sse42  | 0.081531 | 0.001801   | 45.28x  | -97.79%        |
| 1280x720  | float32 | 3        | 0.50    | divide        | avx2   | 0.081531 | 0.001630   | 50.01x  | -98.00%        |
| 1280x720  | float32 | 3        | 0.50    | overlay       | scalar | 0.103032 | 0.023357   | 4.41x   | -77.33%        |
| 1280x720  | float32 | 3        | 0.50    | overlay       | sse42  | 0.103032 | 0.001866   | 55.22x  | -98.19%        |
| 1280x720  | float32 | 3        | 0.50    | overlay       | avx2   | 0.103032 | 0.001647   | 62.57x  | -98.40%        |
| 1280x720  | float32 | 4        | 0.50    | normal        | scalar | 0.055332 | 0.008609   | 6.43x   | -84.44%        |
| 1280x720  | float32 | 4        | 0.50    | normal        | sse42  | 0.055332 | 0.002330   | 23.74x  | -95.79%        |
| 1280x720  | float32 | 4        | 0.50    | normal        | avx2   | 0.055332 | 0.002391   | 23.14x  | -95.68%        |
| 1280x720  | float32 | 4        | 0.50    | soft_light    | scalar | 0.088041 | 0.010483   | 8.40x   | -88.09%        |
| 1280x720  | float32 | 4        | 0.50    | soft_light    | sse42  | 0.088041 | 0.002626   | 33.53x  | -97.02%        |
| 1280x720  | float32 | 4        | 0.50    | soft_light    | avx2   | 0.088041 | 0.002693   | 32.70x  | -96.94%        |
| 1280x720  | float32 | 4        | 0.50    | lighten_only  | scalar | 0.064774 | 0.010831   | 5.98x   | -83.28%        |
| 1280x720  | float32 | 4        | 0.50    | lighten_only  | sse42  | 0.064774 | 0.002878   | 22.50x  | -95.56%        |
| 1280x720  | float32 | 4        | 0.50    | lighten_only  | avx2   | 0.064774 | 0.002701   | 23.98x  | -95.83%        |
| 1280x720  | float32 | 4        | 0.50    | screen        | scalar | 0.067492 | 0.009619   | 7.02x   | -85.75%        |
| 1280x720  | float32 | 4        | 0.50    | screen        | sse42  | 0.067492 | 0.002603   | 25.93x  | -96.14%        |
| 1280x720  | float32 | 4        | 0.50    | screen        | avx2   | 0.067492 | 0.003516   | 19.19x  | -94.79%        |
| 1280x720  | float32 | 4        | 0.50    | dodge         | scalar | 0.067318 | 0.010514   | 6.40x   | -84.38%        |
| 1280x720  | float32 | 4        | 0.50    | dodge         | sse42  | 0.067318 | 0.003172   | 21.22x  | -95.29%        |
| 1280x720  | float32 | 4        | 0.50    | dodge         | avx2   | 0.067318 | 0.002725   | 24.71x  | -95.95%        |
| 1280x720  | float32 | 4        | 0.50    | addition      | scalar | 0.063759 | 0.018878   | 3.38x   | -70.39%        |
| 1280x720  | float32 | 4        | 0.50    | addition      | sse42  | 0.063759 | 0.002760   | 23.10x  | -95.67%        |
| 1280x720  | float32 | 4        | 0.50    | addition      | avx2   | 0.063759 | 0.003050   | 20.90x  | -95.22%        |
| 1280x720  | float32 | 4        | 0.50    | darken_only   | scalar | 0.065813 | 0.010681   | 6.16x   | -83.77%        |
| 1280x720  | float32 | 4        | 0.50    | darken_only   | sse42  | 0.065813 | 0.002563   | 25.68x  | -96.11%        |
| 1280x720  | float32 | 4        | 0.50    | darken_only   | avx2   | 0.065813 | 0.002688   | 24.49x  | -95.92%        |
| 1280x720  | float32 | 4        | 0.50    | multiply      | scalar | 0.066634 | 0.009121   | 7.31x   | -86.31%        |
| 1280x720  | float32 | 4        | 0.50    | multiply      | sse42  | 0.066634 | 0.002539   | 26.25x  | -96.19%        |
| 1280x720  | float32 | 4        | 0.50    | multiply      | avx2   | 0.066634 | 0.002739   | 24.33x  | -95.89%        |
| 1280x720  | float32 | 4        | 0.50    | hard_light    | scalar | 0.098418 | 0.026090   | 3.77x   | -73.49%        |
| 1280x720  | float32 | 4        | 0.50    | hard_light    | sse42  | 0.098418 | 0.003223   | 30.53x  | -96.73%        |
| 1280x720  | float32 | 4        | 0.50    | hard_light    | avx2   | 0.098418 | 0.002779   | 35.41x  | -97.18%        |
| 1280x720  | float32 | 4        | 0.50    | difference    | scalar | 0.090461 | 0.009346   | 9.68x   | -89.67%        |
| 1280x720  | float32 | 4        | 0.50    | difference    | sse42  | 0.090461 | 0.002681   | 33.74x  | -97.04%        |
| 1280x720  | float32 | 4        | 0.50    | difference    | avx2   | 0.090461 | 0.002683   | 33.71x  | -97.03%        |
| 1280x720  | float32 | 4        | 0.50    | subtract      | scalar | 0.065153 | 0.011918   | 5.47x   | -81.71%        |
| 1280x720  | float32 | 4        | 0.50    | subtract      | sse42  | 0.065153 | 0.003074   | 21.19x  | -95.28%        |
| 1280x720  | float32 | 4        | 0.50    | subtract      | avx2   | 0.065153 | 0.002823   | 23.08x  | -95.67%        |
| 1280x720  | float32 | 4        | 0.50    | grain_extract | scalar | 0.067756 | 0.015173   | 4.47x   | -77.61%        |
| 1280x720  | float32 | 4        | 0.50    | grain_extract | sse42  | 0.067756 | 0.002587   | 26.19x  | -96.18%        |
| 1280x720  | float32 | 4        | 0.50    | grain_extract | avx2   | 0.067756 | 0.002659   | 25.48x  | -96.08%        |
| 1280x720  | float32 | 4        | 0.50    | grain_merge   | scalar | 0.066921 | 0.014936   | 4.48x   | -77.68%        |
| 1280x720  | float32 | 4        | 0.50    | grain_merge   | sse42  | 0.066921 | 0.002681   | 24.96x  | -95.99%        |
| 1280x720  | float32 | 4        | 0.50    | grain_merge   | avx2   | 0.066921 | 0.002689   | 24.88x  | -95.98%        |
| 1280x720  | float32 | 4        | 0.50    | divide        | scalar | 0.069462 | 0.010317   | 6.73x   | -85.15%        |
| 1280x720  | float32 | 4        | 0.50    | divide        | sse42  | 0.069462 | 0.002851   | 24.37x  | -95.90%        |
| 1280x720  | float32 | 4        | 0.50    | divide        | avx2   | 0.069462 | 0.002745   | 25.31x  | -96.05%        |
| 1280x720  | float32 | 4        | 0.50    | overlay       | scalar | 0.089872 | 0.024774   | 3.63x   | -72.43%        |
| 1280x720  | float32 | 4        | 0.50    | overlay       | sse42  | 0.089872 | 0.002790   | 32.21x  | -96.90%        |
| 1280x720  | float32 | 4        | 0.50    | overlay       | avx2   | 0.089872 | 0.002692   | 33.39x  | -97.00%        |
| 1920x1080 | uint8   | 3        | 0.50    | normal        | scalar | 0.185375 | 0.052121   | 3.56x   | -71.88%        |
| 1920x1080 | uint8   | 3        | 0.50    | normal        | sse42  | 0.185375 | 0.022162   | 8.36x   | -88.04%        |
| 1920x1080 | uint8   | 3        | 0.50    | normal        | avx2   | 0.185375 | 0.022335   | 8.30x   | -87.95%        |
| 1920x1080 | uint8   | 3        | 0.50    | soft_light    | scalar | 0.246976 | 0.058323   | 4.23x   | -76.39%        |
| 1920x1080 | uint8   | 3        | 0.50    | soft_light    | sse42  | 0.246976 | 0.025080   | 9.85x   | -89.85%        |
| 1920x1080 | uint8   | 3        | 0.50    | soft_light    | avx2   | 0.246976 | 0.023044   | 10.72x  | -90.67%        |
| 1920x1080 | uint8   | 3        | 0.50    | lighten_only  | scalar | 0.186900 | 0.061607   | 3.03x   | -67.04%        |
| 1920x1080 | uint8   | 3        | 0.50    | lighten_only  | sse42  | 0.186900 | 0.024269   | 7.70x   | -87.01%        |
| 1920x1080 | uint8   | 3        | 0.50    | lighten_only  | avx2   | 0.186900 | 0.021814   | 8.57x   | -88.33%        |
| 1920x1080 | uint8   | 3        | 0.50    | screen        | scalar | 0.197031 | 0.055736   | 3.54x   | -71.71%        |
| 1920x1080 | uint8   | 3        | 0.50    | screen        | sse42  | 0.197031 | 0.025182   | 7.82x   | -87.22%        |
| 1920x1080 | uint8   | 3        | 0.50    | screen        | avx2   | 0.197031 | 0.022477   | 8.77x   | -88.59%        |
| 1920x1080 | uint8   | 3        | 0.50    | dodge         | scalar | 0.195721 | 0.058600   | 3.34x   | -70.06%        |
| 1920x1080 | uint8   | 3        | 0.50    | dodge         | sse42  | 0.195721 | 0.025627   | 7.64x   | -86.91%        |
| 1920x1080 | uint8   | 3        | 0.50    | dodge         | avx2   | 0.195721 | 0.023378   | 8.37x   | -88.06%        |
| 1920x1080 | uint8   | 3        | 0.50    | addition      | scalar | 0.190076 | 0.080464   | 2.36x   | -57.67%        |
| 1920x1080 | uint8   | 3        | 0.50    | addition      | sse42  | 0.190076 | 0.025217   | 7.54x   | -86.73%        |
| 1920x1080 | uint8   | 3        | 0.50    | addition      | avx2   | 0.190076 | 0.022685   | 8.38x   | -88.07%        |
| 1920x1080 | uint8   | 3        | 0.50    | darken_only   | scalar | 0.185397 | 0.062177   | 2.98x   | -66.46%        |
| 1920x1080 | uint8   | 3        | 0.50    | darken_only   | sse42  | 0.185397 | 0.024600   | 7.54x   | -86.73%        |
| 1920x1080 | uint8   | 3        | 0.50    | darken_only   | avx2   | 0.185397 | 0.021861   | 8.48x   | -88.21%        |
| 1920x1080 | uint8   | 3        | 0.50    | multiply      | scalar | 0.188996 | 0.056640   | 3.34x   | -70.03%        |
| 1920x1080 | uint8   | 3        | 0.50    | multiply      | sse42  | 0.188996 | 0.024337   | 7.77x   | -87.12%        |
| 1920x1080 | uint8   | 3        | 0.50    | multiply      | avx2   | 0.188996 | 0.022479   | 8.41x   | -88.11%        |
| 1920x1080 | uint8   | 3        | 0.50    | hard_light    | scalar | 0.258153 | 0.095832   | 2.69x   | -62.88%        |
| 1920x1080 | uint8   | 3        | 0.50    | hard_light    | sse42  | 0.258153 | 0.025622   | 10.08x  | -90.07%        |
| 1920x1080 | uint8   | 3        | 0.50    | hard_light    | avx2   | 0.258153 | 0.023036   | 11.21x  | -91.08%        |
| 1920x1080 | uint8   | 3        | 0.50    | difference    | scalar | 0.247381 | 0.055728   | 4.44x   | -77.47%        |
| 1920x1080 | uint8   | 3        | 0.50    | difference    | sse42  | 0.247381 | 0.024738   | 10.00x  | -90.00%        |
| 1920x1080 | uint8   | 3        | 0.50    | difference    | avx2   | 0.247381 | 0.022044   | 11.22x  | -91.09%        |
| 1920x1080 | uint8   | 3        | 0.50    | subtract      | scalar | 0.186911 | 0.052765   | 3.54x   | -71.77%        |
| 1920x1080 | uint8   | 3        | 0.50    | subtract      | sse42  | 0.186911 | 0.025294   | 7.39x   | -86.47%        |
| 1920x1080 | uint8   | 3        | 0.50    | subtract      | avx2   | 0.186911 | 0.022236   | 8.41x   | -88.10%        |
| 1920x1080 | uint8   | 3        | 0.50    | grain_extract | scalar | 0.197760 | 0.069503   | 2.85x   | -64.85%        |
| 1920x1080 | uint8   | 3        | 0.50    | grain_extract | sse42  | 0.197760 | 0.025291   | 7.82x   | -87.21%        |
| 1920x1080 | uint8   | 3        | 0.50    | grain_extract | avx2   | 0.197760 | 0.022644   | 8.73x   | -88.55%        |
| 1920x1080 | uint8   | 3        | 0.50    | grain_merge   | scalar | 0.198514 | 0.068419   | 2.90x   | -65.53%        |
| 1920x1080 | uint8   | 3        | 0.50    | grain_merge   | sse42  | 0.198514 | 0.025112   | 7.91x   | -87.35%        |
| 1920x1080 | uint8   | 3        | 0.50    | grain_merge   | avx2   | 0.198514 | 0.022385   | 8.87x   | -88.72%        |
| 1920x1080 | uint8   | 3        | 0.50    | divide        | scalar | 0.213788 | 0.057721   | 3.70x   | -73.00%        |
| 1920x1080 | uint8   | 3        | 0.50    | divide        | sse42  | 0.213788 | 0.025609   | 8.35x   | -88.02%        |
| 1920x1080 | uint8   | 3        | 0.50    | divide        | avx2   | 0.213788 | 0.022741   | 9.40x   | -89.36%        |
| 1920x1080 | uint8   | 3        | 0.50    | overlay       | scalar | 0.262105 | 0.095705   | 2.74x   | -63.49%        |
| 1920x1080 | uint8   | 3        | 0.50    | overlay       | sse42  | 0.262105 | 0.025802   | 10.16x  | -90.16%        |
| 1920x1080 | uint8   | 3        | 0.50    | overlay       | avx2   | 0.262105 | 0.022688   | 11.55x  | -91.34%        |
| 1920x1080 | uint8   | 4        | 0.50    | normal        | scalar | 0.132488 | 0.042472   | 3.12x   | -67.94%        |
| 1920x1080 | uint8   | 4        | 0.50    | normal        | sse42  | 0.132488 | 0.005537   | 23.93x  | -95.82%        |
| 1920x1080 | uint8   | 4        | 0.50    | normal        | avx2   | 0.132488 | 0.005003   | 26.48x  | -96.22%        |
| 1920x1080 | uint8   | 4        | 0.50    | soft_light    | scalar | 0.192625 | 0.052181   | 3.69x   | -72.91%        |
| 1920x1080 | uint8   | 4        | 0.50    | soft_light    | sse42  | 0.192625 | 0.007002   | 27.51x  | -96.37%        |
| 1920x1080 | uint8   | 4        | 0.50    | soft_light    | avx2   | 0.192625 | 0.006544   | 29.44x  | -96.60%        |
| 1920x1080 | uint8   | 4        | 0.50    | lighten_only  | scalar | 0.142657 | 0.054860   | 2.60x   | -61.54%        |
| 1920x1080 | uint8   | 4        | 0.50    | lighten_only  | sse42  | 0.142657 | 0.006249   | 22.83x  | -95.62%        |
| 1920x1080 | uint8   | 4        | 0.50    | lighten_only  | avx2   | 0.142657 | 0.005904   | 24.16x  | -95.86%        |
| 1920x1080 | uint8   | 4        | 0.50    | screen        | scalar | 0.147905 | 0.049957   | 2.96x   | -66.22%        |
| 1920x1080 | uint8   | 4        | 0.50    | screen        | sse42  | 0.147905 | 0.007420   | 19.93x  | -94.98%        |
| 1920x1080 | uint8   | 4        | 0.50    | screen        | avx2   | 0.147905 | 0.006121   | 24.16x  | -95.86%        |
| 1920x1080 | uint8   | 4        | 0.50    | dodge         | scalar | 0.145716 | 0.053201   | 2.74x   | -63.49%        |
| 1920x1080 | uint8   | 4        | 0.50    | dodge         | sse42  | 0.145716 | 0.007498   | 19.44x  | -94.85%        |
| 1920x1080 | uint8   | 4        | 0.50    | dodge         | avx2   | 0.145716 | 0.006468   | 22.53x  | -95.56%        |
| 1920x1080 | uint8   | 4        | 0.50    | addition      | scalar | 0.141815 | 0.063691   | 2.23x   | -55.09%        |
| 1920x1080 | uint8   | 4        | 0.50    | addition      | sse42  | 0.141815 | 0.008302   | 17.08x  | -94.15%        |
| 1920x1080 | uint8   | 4        | 0.50    | addition      | avx2   | 0.141815 | 0.006314   | 22.46x  | -95.55%        |
| 1920x1080 | uint8   | 4        | 0.50    | darken_only   | scalar | 0.147275 | 0.055617   | 2.65x   | -62.24%        |
| 1920x1080 | uint8   | 4        | 0.50    | darken_only   | sse42  | 0.147275 | 0.006888   | 21.38x  | -95.32%        |
| 1920x1080 | uint8   | 4        | 0.50    | darken_only   | avx2   | 0.147275 | 0.005922   | 24.87x  | -95.98%        |
| 1920x1080 | uint8   | 4        | 0.50    | multiply      | scalar | 0.146219 | 0.052851   | 2.77x   | -63.86%        |
| 1920x1080 | uint8   | 4        | 0.50    | multiply      | sse42  | 0.146219 | 0.006557   | 22.30x  | -95.52%        |
| 1920x1080 | uint8   | 4        | 0.50    | multiply      | avx2   | 0.146219 | 0.005984   | 24.44x  | -95.91%        |
| 1920x1080 | uint8   | 4        | 0.50    | hard_light    | scalar | 0.231543 | 0.086742   | 2.67x   | -62.54%        |
| 1920x1080 | uint8   | 4        | 0.50    | hard_light    | sse42  | 0.231543 | 0.007752   | 29.87x  | -96.65%        |
| 1920x1080 | uint8   | 4        | 0.50    | hard_light    | avx2   | 0.231543 | 0.006642   | 34.86x  | -97.13%        |
| 1920x1080 | uint8   | 4        | 0.50    | difference    | scalar | 0.228316 | 0.051118   | 4.47x   | -77.61%        |
| 1920x1080 | uint8   | 4        | 0.50    | difference    | sse42  | 0.228316 | 0.006769   | 33.73x  | -97.04%        |
| 1920x1080 | uint8   | 4        | 0.50    | difference    | avx2   | 0.228316 | 0.006141   | 37.18x  | -97.31%        |
| 1920x1080 | uint8   | 4        | 0.50    | subtract      | scalar | 0.149552 | 0.050239   | 2.98x   | -66.41%        |
| 1920x1080 | uint8   | 4        | 0.50    | subtract      | sse42  | 0.149552 | 0.008720   | 17.15x  | -94.17%        |
| 1920x1080 | uint8   | 4        | 0.50    | subtract      | avx2   | 0.149552 | 0.007082   | 21.12x  | -95.26%        |
| 1920x1080 | uint8   | 4        | 0.50    | grain_extract | scalar | 0.152362 | 0.061865   | 2.46x   | -59.40%        |
| 1920x1080 | uint8   | 4        | 0.50    | grain_extract | sse42  | 0.152362 | 0.006702   | 22.73x  | -95.60%        |
| 1920x1080 | uint8   | 4        | 0.50    | grain_extract | avx2   | 0.152362 | 0.006553   | 23.25x  | -95.70%        |
| 1920x1080 | uint8   | 4        | 0.50    | grain_merge   | scalar | 0.159708 | 0.062019   | 2.58x   | -61.17%        |
| 1920x1080 | uint8   | 4        | 0.50    | grain_merge   | sse42  | 0.159708 | 0.006816   | 23.43x  | -95.73%        |
| 1920x1080 | uint8   | 4        | 0.50    | grain_merge   | avx2   | 0.159708 | 0.006420   | 24.88x  | -95.98%        |
| 1920x1080 | uint8   | 4        | 0.50    | divide        | scalar | 0.156856 | 0.052913   | 2.96x   | -66.27%        |
| 1920x1080 | uint8   | 4        | 0.50    | divide        | sse42  | 0.156856 | 0.006954   | 22.56x  | -95.57%        |
| 1920x1080 | uint8   | 4        | 0.50    | divide        | avx2   | 0.156856 | 0.006324   | 24.80x  | -95.97%        |
| 1920x1080 | uint8   | 4        | 0.50    | overlay       | scalar | 0.209992 | 0.083388   | 2.52x   | -60.29%        |
| 1920x1080 | uint8   | 4        | 0.50    | overlay       | sse42  | 0.209992 | 0.007453   | 28.18x  | -96.45%        |
| 1920x1080 | uint8   | 4        | 0.50    | overlay       | avx2   | 0.209992 | 0.006384   | 32.90x  | -96.96%        |
| 1920x1080 | float32 | 3        | 0.50    | normal        | scalar | 0.164828 | 0.016933   | 9.73x   | -89.73%        |
| 1920x1080 | float32 | 3        | 0.50    | normal        | sse42  | 0.164828 | 0.007194   | 22.91x  | -95.64%        |
| 1920x1080 | float32 | 3        | 0.50    | normal        | avx2   | 0.164828 | 0.005042   | 32.69x  | -96.94%        |
| 1920x1080 | float32 | 3        | 0.50    | soft_light    | scalar | 0.230488 | 0.021101   | 10.92x  | -90.85%        |
| 1920x1080 | float32 | 3        | 0.50    | soft_light    | sse42  | 0.230488 | 0.004176   | 55.19x  | -98.19%        |
| 1920x1080 | float32 | 3        | 0.50    | soft_light    | avx2   | 0.230488 | 0.004779   | 48.23x  | -97.93%        |
| 1920x1080 | float32 | 3        | 0.50    | lighten_only  | scalar | 0.174212 | 0.025226   | 6.91x   | -85.52%        |
| 1920x1080 | float32 | 3        | 0.50    | lighten_only  | sse42  | 0.174212 | 0.005321   | 32.74x  | -96.95%        |
| 1920x1080 | float32 | 3        | 0.50    | lighten_only  | avx2   | 0.174212 | 0.004084   | 42.65x  | -97.66%        |
| 1920x1080 | float32 | 3        | 0.50    | screen        | scalar | 0.184185 | 0.019504   | 9.44x   | -89.41%        |
| 1920x1080 | float32 | 3        | 0.50    | screen        | sse42  | 0.184185 | 0.004113   | 44.78x  | -97.77%        |
| 1920x1080 | float32 | 3        | 0.50    | screen        | avx2   | 0.184185 | 0.004616   | 39.90x  | -97.49%        |
| 1920x1080 | float32 | 3        | 0.50    | dodge         | scalar | 0.182265 | 0.021561   | 8.45x   | -88.17%        |
| 1920x1080 | float32 | 3        | 0.50    | dodge         | sse42  | 0.182265 | 0.005161   | 35.32x  | -97.17%        |
| 1920x1080 | float32 | 3        | 0.50    | dodge         | avx2   | 0.182265 | 0.004359   | 41.81x  | -97.61%        |
| 1920x1080 | float32 | 3        | 0.50    | addition      | scalar | 0.176570 | 0.052783   | 3.35x   | -70.11%        |
| 1920x1080 | float32 | 3        | 0.50    | addition      | sse42  | 0.176570 | 0.004346   | 40.63x  | -97.54%        |
| 1920x1080 | float32 | 3        | 0.50    | addition      | avx2   | 0.176570 | 0.004119   | 42.86x  | -97.67%        |
| 1920x1080 | float32 | 3        | 0.50    | darken_only   | scalar | 0.176652 | 0.025431   | 6.95x   | -85.60%        |
| 1920x1080 | float32 | 3        | 0.50    | darken_only   | sse42  | 0.176652 | 0.004577   | 38.59x  | -97.41%        |
| 1920x1080 | float32 | 3        | 0.50    | darken_only   | avx2   | 0.176652 | 0.003982   | 44.37x  | -97.75%        |
| 1920x1080 | float32 | 3        | 0.50    | multiply      | scalar | 0.178377 | 0.019211   | 9.29x   | -89.23%        |
| 1920x1080 | float32 | 3        | 0.50    | multiply      | sse42  | 0.178377 | 0.004778   | 37.33x  | -97.32%        |
| 1920x1080 | float32 | 3        | 0.50    | multiply      | avx2   | 0.178377 | 0.003817   | 46.73x  | -97.86%        |
| 1920x1080 | float32 | 3        | 0.50    | hard_light    | scalar | 0.251294 | 0.057888   | 4.34x   | -76.96%        |
| 1920x1080 | float32 | 3        | 0.50    | hard_light    | sse42  | 0.251294 | 0.004980   | 50.46x  | -98.02%        |
| 1920x1080 | float32 | 3        | 0.50    | hard_light    | avx2   | 0.251294 | 0.004643   | 54.12x  | -98.15%        |
| 1920x1080 | float32 | 3        | 0.50    | difference    | scalar | 0.237953 | 0.018933   | 12.57x  | -92.04%        |
| 1920x1080 | float32 | 3        | 0.50    | difference    | sse42  | 0.237953 | 0.005931   | 40.12x  | -97.51%        |
| 1920x1080 | float32 | 3        | 0.50    | difference    | avx2   | 0.237953 | 0.003925   | 60.62x  | -98.35%        |
| 1920x1080 | float32 | 3        | 0.50    | subtract      | scalar | 0.175865 | 0.023377   | 7.52x   | -86.71%        |
| 1920x1080 | float32 | 3        | 0.50    | subtract      | sse42  | 0.175865 | 0.004507   | 39.02x  | -97.44%        |
| 1920x1080 | float32 | 3        | 0.50    | subtract      | avx2   | 0.175865 | 0.004646   | 37.85x  | -97.36%        |
| 1920x1080 | float32 | 3        | 0.50    | grain_extract | scalar | 0.186051 | 0.033910   | 5.49x   | -81.77%        |
| 1920x1080 | float32 | 3        | 0.50    | grain_extract | sse42  | 0.186051 | 0.004234   | 43.95x  | -97.72%        |
| 1920x1080 | float32 | 3        | 0.50    | grain_extract | avx2   | 0.186051 | 0.004112   | 45.24x  | -97.79%        |
| 1920x1080 | float32 | 3        | 0.50    | grain_merge   | scalar | 0.182632 | 0.034267   | 5.33x   | -81.24%        |
| 1920x1080 | float32 | 3        | 0.50    | grain_merge   | sse42  | 0.182632 | 0.004492   | 40.66x  | -97.54%        |
| 1920x1080 | float32 | 3        | 0.50    | grain_merge   | avx2   | 0.182632 | 0.004544   | 40.19x  | -97.51%        |
| 1920x1080 | float32 | 3        | 0.50    | divide        | scalar | 0.186617 | 0.021208   | 8.80x   | -88.64%        |
| 1920x1080 | float32 | 3        | 0.50    | divide        | sse42  | 0.186617 | 0.005208   | 35.83x  | -97.21%        |
| 1920x1080 | float32 | 3        | 0.50    | divide        | avx2   | 0.186617 | 0.004919   | 37.94x  | -97.36%        |
| 1920x1080 | float32 | 3        | 0.50    | overlay       | scalar | 0.236200 | 0.054011   | 4.37x   | -77.13%        |
| 1920x1080 | float32 | 3        | 0.50    | overlay       | sse42  | 0.236200 | 0.005226   | 45.20x  | -97.79%        |
| 1920x1080 | float32 | 3        | 0.50    | overlay       | avx2   | 0.236200 | 0.003998   | 59.09x  | -98.31%        |
| 1920x1080 | float32 | 4        | 0.50    | normal        | scalar | 0.128519 | 0.019533   | 6.58x   | -84.80%        |
| 1920x1080 | float32 | 4        | 0.50    | normal        | sse42  | 0.128519 | 0.005565   | 23.09x  | -95.67%        |
| 1920x1080 | float32 | 4        | 0.50    | normal        | avx2   | 0.128519 | 0.007860   | 16.35x  | -93.88%        |
| 1920x1080 | float32 | 4        | 0.50    | soft_light    | scalar | 0.191540 | 0.023221   | 8.25x   | -87.88%        |
| 1920x1080 | float32 | 4        | 0.50    | soft_light    | sse42  | 0.191540 | 0.006130   | 31.25x  | -96.80%        |
| 1920x1080 | float32 | 4        | 0.50    | soft_light    | avx2   | 0.191540 | 0.006727   | 28.47x  | -96.49%        |
| 1920x1080 | float32 | 4        | 0.50    | lighten_only  | scalar | 0.138648 | 0.025965   | 5.34x   | -81.27%        |
| 1920x1080 | float32 | 4        | 0.50    | lighten_only  | sse42  | 0.138648 | 0.007216   | 19.21x  | -94.80%        |
| 1920x1080 | float32 | 4        | 0.50    | lighten_only  | avx2   | 0.138648 | 0.006877   | 20.16x  | -95.04%        |
| 1920x1080 | float32 | 4        | 0.50    | screen        | scalar | 0.148151 | 0.021868   | 6.77x   | -85.24%        |
| 1920x1080 | float32 | 4        | 0.50    | screen        | sse42  | 0.148151 | 0.006959   | 21.29x  | -95.30%        |
| 1920x1080 | float32 | 4        | 0.50    | screen        | avx2   | 0.148151 | 0.006276   | 23.60x  | -95.76%        |
| 1920x1080 | float32 | 4        | 0.50    | dodge         | scalar | 0.144278 | 0.024254   | 5.95x   | -83.19%        |
| 1920x1080 | float32 | 4        | 0.50    | dodge         | sse42  | 0.144278 | 0.007663   | 18.83x  | -94.69%        |
| 1920x1080 | float32 | 4        | 0.50    | dodge         | avx2   | 0.144278 | 0.007045   | 20.48x  | -95.12%        |
| 1920x1080 | float32 | 4        | 0.50    | addition      | scalar | 0.139188 | 0.043062   | 3.23x   | -69.06%        |
| 1920x1080 | float32 | 4        | 0.50    | addition      | sse42  | 0.139188 | 0.006240   | 22.31x  | -95.52%        |
| 1920x1080 | float32 | 4        | 0.50    | addition      | avx2   | 0.139188 | 0.006513   | 21.37x  | -95.32%        |
| 1920x1080 | float32 | 4        | 0.50    | darken_only   | scalar | 0.135000 | 0.024650   | 5.48x   | -81.74%        |
| 1920x1080 | float32 | 4        | 0.50    | darken_only   | sse42  | 0.135000 | 0.005605   | 24.09x  | -95.85%        |
| 1920x1080 | float32 | 4        | 0.50    | darken_only   | avx2   | 0.135000 | 0.006425   | 21.01x  | -95.24%        |
| 1920x1080 | float32 | 4        | 0.50    | multiply      | scalar | 0.147870 | 0.020941   | 7.06x   | -85.84%        |
| 1920x1080 | float32 | 4        | 0.50    | multiply      | sse42  | 0.147870 | 0.006031   | 24.52x  | -95.92%        |
| 1920x1080 | float32 | 4        | 0.50    | multiply      | avx2   | 0.147870 | 0.006398   | 23.11x  | -95.67%        |
| 1920x1080 | float32 | 4        | 0.50    | hard_light    | scalar | 0.210893 | 0.060078   | 3.51x   | -71.51%        |
| 1920x1080 | float32 | 4        | 0.50    | hard_light    | sse42  | 0.210893 | 0.007770   | 27.14x  | -96.32%        |
| 1920x1080 | float32 | 4        | 0.50    | hard_light    | avx2   | 0.210893 | 0.006397   | 32.97x  | -96.97%        |
| 1920x1080 | float32 | 4        | 0.50    | difference    | scalar | 0.196254 | 0.022775   | 8.62x   | -88.40%        |
| 1920x1080 | float32 | 4        | 0.50    | difference    | sse42  | 0.196254 | 0.006159   | 31.86x  | -96.86%        |
| 1920x1080 | float32 | 4        | 0.50    | difference    | avx2   | 0.196254 | 0.007175   | 27.35x  | -96.34%        |
| 1920x1080 | float32 | 4        | 0.50    | subtract      | scalar | 0.139162 | 0.028804   | 4.83x   | -79.30%        |
| 1920x1080 | float32 | 4        | 0.50    | subtract      | sse42  | 0.139162 | 0.008036   | 17.32x  | -94.23%        |
| 1920x1080 | float32 | 4        | 0.50    | subtract      | avx2   | 0.139162 | 0.007287   | 19.10x  | -94.76%        |
| 1920x1080 | float32 | 4        | 0.50    | grain_extract | scalar | 0.144872 | 0.034232   | 4.23x   | -76.37%        |
| 1920x1080 | float32 | 4        | 0.50    | grain_extract | sse42  | 0.144872 | 0.007397   | 19.58x  | -94.89%        |
| 1920x1080 | float32 | 4        | 0.50    | grain_extract | avx2   | 0.144872 | 0.006497   | 22.30x  | -95.52%        |
| 1920x1080 | float32 | 4        | 0.50    | grain_merge   | scalar | 0.142181 | 0.034365   | 4.14x   | -75.83%        |
| 1920x1080 | float32 | 4        | 0.50    | grain_merge   | sse42  | 0.142181 | 0.006962   | 20.42x  | -95.10%        |
| 1920x1080 | float32 | 4        | 0.50    | grain_merge   | avx2   | 0.142181 | 0.006251   | 22.75x  | -95.60%        |
| 1920x1080 | float32 | 4        | 0.50    | divide        | scalar | 0.149912 | 0.023971   | 6.25x   | -84.01%        |
| 1920x1080 | float32 | 4        | 0.50    | divide        | sse42  | 0.149912 | 0.006665   | 22.49x  | -95.55%        |
| 1920x1080 | float32 | 4        | 0.50    | divide        | avx2   | 0.149912 | 0.006759   | 22.18x  | -95.49%        |
| 1920x1080 | float32 | 4        | 0.50    | overlay       | scalar | 0.200325 | 0.055940   | 3.58x   | -72.08%        |
| 1920x1080 | float32 | 4        | 0.50    | overlay       | sse42  | 0.200325 | 0.006786   | 29.52x  | -96.61%        |
| 1920x1080 | float32 | 4        | 0.50    | overlay       | avx2   | 0.200325 | 0.006779   | 29.55x  | -96.62%        |
| 2560x1440 | uint8   | 3        | 0.50    | normal        | scalar | 0.343236 | 0.098087   | 3.50x   | -71.42%        |
| 2560x1440 | uint8   | 3        | 0.50    | normal        | sse42  | 0.343236 | 0.041579   | 8.26x   | -87.89%        |
| 2560x1440 | uint8   | 3        | 0.50    | normal        | avx2   | 0.343236 | 0.039950   | 8.59x   | -88.36%        |
| 2560x1440 | uint8   | 3        | 0.50    | soft_light    | scalar | 0.456343 | 0.104033   | 4.39x   | -77.20%        |
| 2560x1440 | uint8   | 3        | 0.50    | soft_light    | sse42  | 0.456343 | 0.045289   | 10.08x  | -90.08%        |
| 2560x1440 | uint8   | 3        | 0.50    | soft_light    | avx2   | 0.456343 | 0.041028   | 11.12x  | -91.01%        |
| 2560x1440 | uint8   | 3        | 0.50    | lighten_only  | scalar | 0.344369 | 0.113318   | 3.04x   | -67.09%        |
| 2560x1440 | uint8   | 3        | 0.50    | lighten_only  | sse42  | 0.344369 | 0.044205   | 7.79x   | -87.16%        |
| 2560x1440 | uint8   | 3        | 0.50    | lighten_only  | avx2   | 0.344369 | 0.039960   | 8.62x   | -88.40%        |
| 2560x1440 | uint8   | 3        | 0.50    | screen        | scalar | 0.364073 | 0.100220   | 3.63x   | -72.47%        |
| 2560x1440 | uint8   | 3        | 0.50    | screen        | sse42  | 0.364073 | 0.044632   | 8.16x   | -87.74%        |
| 2560x1440 | uint8   | 3        | 0.50    | screen        | avx2   | 0.364073 | 0.040955   | 8.89x   | -88.75%        |
| 2560x1440 | uint8   | 3        | 0.50    | dodge         | scalar | 0.361195 | 0.104585   | 3.45x   | -71.04%        |
| 2560x1440 | uint8   | 3        | 0.50    | dodge         | sse42  | 0.361195 | 0.046080   | 7.84x   | -87.24%        |
| 2560x1440 | uint8   | 3        | 0.50    | dodge         | avx2   | 0.361195 | 0.041671   | 8.67x   | -88.46%        |
| 2560x1440 | uint8   | 3        | 0.50    | addition      | scalar | 0.355930 | 0.144540   | 2.46x   | -59.39%        |
| 2560x1440 | uint8   | 3        | 0.50    | addition      | sse42  | 0.355930 | 0.045381   | 7.84x   | -87.25%        |
| 2560x1440 | uint8   | 3        | 0.50    | addition      | avx2   | 0.355930 | 0.040120   | 8.87x   | -88.73%        |
| 2560x1440 | uint8   | 3        | 0.50    | darken_only   | scalar | 0.339031 | 0.110987   | 3.05x   | -67.26%        |
| 2560x1440 | uint8   | 3        | 0.50    | darken_only   | sse42  | 0.339031 | 0.043887   | 7.73x   | -87.06%        |
| 2560x1440 | uint8   | 3        | 0.50    | darken_only   | avx2   | 0.339031 | 0.039730   | 8.53x   | -88.28%        |
| 2560x1440 | uint8   | 3        | 0.50    | multiply      | scalar | 0.350284 | 0.100938   | 3.47x   | -71.18%        |
| 2560x1440 | uint8   | 3        | 0.50    | multiply      | sse42  | 0.350284 | 0.044014   | 7.96x   | -87.43%        |
| 2560x1440 | uint8   | 3        | 0.50    | multiply      | avx2   | 0.350284 | 0.040017   | 8.75x   | -88.58%        |
| 2560x1440 | uint8   | 3        | 0.50    | hard_light    | scalar | 0.501235 | 0.174533   | 2.87x   | -65.18%        |
| 2560x1440 | uint8   | 3        | 0.50    | hard_light    | sse42  | 0.501235 | 0.044975   | 11.14x  | -91.03%        |
| 2560x1440 | uint8   | 3        | 0.50    | hard_light    | avx2   | 0.501235 | 0.040312   | 12.43x  | -91.96%        |
| 2560x1440 | uint8   | 3        | 0.50    | difference    | scalar | 0.443352 | 0.098006   | 4.52x   | -77.89%        |
| 2560x1440 | uint8   | 3        | 0.50    | difference    | sse42  | 0.443352 | 0.043462   | 10.20x  | -90.20%        |
| 2560x1440 | uint8   | 3        | 0.50    | difference    | avx2   | 0.443352 | 0.038452   | 11.53x  | -91.33%        |
| 2560x1440 | uint8   | 3        | 0.50    | subtract      | scalar | 0.348790 | 0.092573   | 3.77x   | -73.46%        |
| 2560x1440 | uint8   | 3        | 0.50    | subtract      | sse42  | 0.348790 | 0.044271   | 7.88x   | -87.31%        |
| 2560x1440 | uint8   | 3        | 0.50    | subtract      | avx2   | 0.348790 | 0.039104   | 8.92x   | -88.79%        |
| 2560x1440 | uint8   | 3        | 0.50    | grain_extract | scalar | 0.364707 | 0.119619   | 3.05x   | -67.20%        |
| 2560x1440 | uint8   | 3        | 0.50    | grain_extract | sse42  | 0.364707 | 0.044134   | 8.26x   | -87.90%        |
| 2560x1440 | uint8   | 3        | 0.50    | grain_extract | avx2   | 0.364707 | 0.039316   | 9.28x   | -89.22%        |
| 2560x1440 | uint8   | 3        | 0.50    | grain_merge   | scalar | 0.358289 | 0.120605   | 2.97x   | -66.34%        |
| 2560x1440 | uint8   | 3        | 0.50    | grain_merge   | sse42  | 0.358289 | 0.043868   | 8.17x   | -87.76%        |
| 2560x1440 | uint8   | 3        | 0.50    | grain_merge   | avx2   | 0.358289 | 0.039048   | 9.18x   | -89.10%        |
| 2560x1440 | uint8   | 3        | 0.50    | divide        | scalar | 0.361810 | 0.101067   | 3.58x   | -72.07%        |
| 2560x1440 | uint8   | 3        | 0.50    | divide        | sse42  | 0.361810 | 0.044261   | 8.17x   | -87.77%        |
| 2560x1440 | uint8   | 3        | 0.50    | divide        | avx2   | 0.361810 | 0.039476   | 9.17x   | -89.09%        |
| 2560x1440 | uint8   | 3        | 0.50    | overlay       | scalar | 0.464105 | 0.164439   | 2.82x   | -64.57%        |
| 2560x1440 | uint8   | 3        | 0.50    | overlay       | sse42  | 0.464105 | 0.044379   | 10.46x  | -90.44%        |
| 2560x1440 | uint8   | 3        | 0.50    | overlay       | avx2   | 0.464105 | 0.039517   | 11.74x  | -91.49%        |
| 2560x1440 | uint8   | 4        | 0.50    | normal        | scalar | 0.253481 | 0.073066   | 3.47x   | -71.18%        |
| 2560x1440 | uint8   | 4        | 0.50    | normal        | sse42  | 0.253481 | 0.009708   | 26.11x  | -96.17%        |
| 2560x1440 | uint8   | 4        | 0.50    | normal        | avx2   | 0.253481 | 0.008704   | 29.12x  | -96.57%        |
| 2560x1440 | uint8   | 4        | 0.50    | soft_light    | scalar | 0.363082 | 0.098616   | 3.68x   | -72.84%        |
| 2560x1440 | uint8   | 4        | 0.50    | soft_light    | sse42  | 0.363082 | 0.012527   | 28.98x  | -96.55%        |
| 2560x1440 | uint8   | 4        | 0.50    | soft_light    | avx2   | 0.363082 | 0.011549   | 31.44x  | -96.82%        |
| 2560x1440 | uint8   | 4        | 0.50    | lighten_only  | scalar | 0.257406 | 0.095331   | 2.70x   | -62.96%        |
| 2560x1440 | uint8   | 4        | 0.50    | lighten_only  | sse42  | 0.257406 | 0.010909   | 23.60x  | -95.76%        |
| 2560x1440 | uint8   | 4        | 0.50    | lighten_only  | avx2   | 0.257406 | 0.010822   | 23.79x  | -95.80%        |
| 2560x1440 | uint8   | 4        | 0.50    | screen        | scalar | 0.278465 | 0.095527   | 2.92x   | -65.70%        |
| 2560x1440 | uint8   | 4        | 0.50    | screen        | sse42  | 0.278465 | 0.012199   | 22.83x  | -95.62%        |
| 2560x1440 | uint8   | 4        | 0.50    | screen        | avx2   | 0.278465 | 0.011072   | 25.15x  | -96.02%        |
| 2560x1440 | uint8   | 4        | 0.50    | dodge         | scalar | 0.276290 | 0.093088   | 2.97x   | -66.31%        |
| 2560x1440 | uint8   | 4        | 0.50    | dodge         | sse42  | 0.276290 | 0.013161   | 20.99x  | -95.24%        |
| 2560x1440 | uint8   | 4        | 0.50    | dodge         | avx2   | 0.276290 | 0.011426   | 24.18x  | -95.86%        |
| 2560x1440 | uint8   | 4        | 0.50    | addition      | scalar | 0.262305 | 0.111189   | 2.36x   | -57.61%        |
| 2560x1440 | uint8   | 4        | 0.50    | addition      | sse42  | 0.262305 | 0.015073   | 17.40x  | -94.25%        |
| 2560x1440 | uint8   | 4        | 0.50    | addition      | avx2   | 0.262305 | 0.010836   | 24.21x  | -95.87%        |
| 2560x1440 | uint8   | 4        | 0.50    | darken_only   | scalar | 0.257438 | 0.096009   | 2.68x   | -62.71%        |
| 2560x1440 | uint8   | 4        | 0.50    | darken_only   | sse42  | 0.257438 | 0.010714   | 24.03x  | -95.84%        |
| 2560x1440 | uint8   | 4        | 0.50    | darken_only   | avx2   | 0.257438 | 0.010411   | 24.73x  | -95.96%        |
| 2560x1440 | uint8   | 4        | 0.50    | multiply      | scalar | 0.261583 | 0.089398   | 2.93x   | -65.82%        |
| 2560x1440 | uint8   | 4        | 0.50    | multiply      | sse42  | 0.261583 | 0.011263   | 23.22x  | -95.69%        |
| 2560x1440 | uint8   | 4        | 0.50    | multiply      | avx2   | 0.261583 | 0.010595   | 24.69x  | -95.95%        |
| 2560x1440 | uint8   | 4        | 0.50    | hard_light    | scalar | 0.408433 | 0.146448   | 2.79x   | -64.14%        |
| 2560x1440 | uint8   | 4        | 0.50    | hard_light    | sse42  | 0.408433 | 0.013045   | 31.31x  | -96.81%        |
| 2560x1440 | uint8   | 4        | 0.50    | hard_light    | avx2   | 0.408433 | 0.011240   | 36.34x  | -97.25%        |
| 2560x1440 | uint8   | 4        | 0.50    | difference    | scalar | 0.361108 | 0.089076   | 4.05x   | -75.33%        |
| 2560x1440 | uint8   | 4        | 0.50    | difference    | sse42  | 0.361108 | 0.011236   | 32.14x  | -96.89%        |
| 2560x1440 | uint8   | 4        | 0.50    | difference    | avx2   | 0.361108 | 0.010255   | 35.21x  | -97.16%        |
| 2560x1440 | uint8   | 4        | 0.50    | subtract      | scalar | 0.261856 | 0.084919   | 3.08x   | -67.57%        |
| 2560x1440 | uint8   | 4        | 0.50    | subtract      | sse42  | 0.261856 | 0.014866   | 17.61x  | -94.32%        |
| 2560x1440 | uint8   | 4        | 0.50    | subtract      | avx2   | 0.261856 | 0.012148   | 21.55x  | -95.36%        |
| 2560x1440 | uint8   | 4        | 0.50    | grain_extract | scalar | 0.268812 | 0.107840   | 2.49x   | -59.88%        |
| 2560x1440 | uint8   | 4        | 0.50    | grain_extract | sse42  | 0.268812 | 0.011788   | 22.80x  | -95.61%        |
| 2560x1440 | uint8   | 4        | 0.50    | grain_extract | avx2   | 0.268812 | 0.011066   | 24.29x  | -95.88%        |
| 2560x1440 | uint8   | 4        | 0.50    | grain_merge   | scalar | 0.292357 | 0.108381   | 2.70x   | -62.93%        |
| 2560x1440 | uint8   | 4        | 0.50    | grain_merge   | sse42  | 0.292357 | 0.011972   | 24.42x  | -95.91%        |
| 2560x1440 | uint8   | 4        | 0.50    | grain_merge   | avx2   | 0.292357 | 0.012056   | 24.25x  | -95.88%        |
| 2560x1440 | uint8   | 4        | 0.50    | divide        | scalar | 0.310948 | 0.092598   | 3.36x   | -70.22%        |
| 2560x1440 | uint8   | 4        | 0.50    | divide        | sse42  | 0.310948 | 0.013506   | 23.02x  | -95.66%        |
| 2560x1440 | uint8   | 4        | 0.50    | divide        | avx2   | 0.310948 | 0.011435   | 27.19x  | -96.32%        |
| 2560x1440 | uint8   | 4        | 0.50    | overlay       | scalar | 0.419176 | 0.146632   | 2.86x   | -65.02%        |
| 2560x1440 | uint8   | 4        | 0.50    | overlay       | sse42  | 0.419176 | 0.012834   | 32.66x  | -96.94%        |
| 2560x1440 | uint8   | 4        | 0.50    | overlay       | avx2   | 0.419176 | 0.011240   | 37.29x  | -97.32%        |
| 2560x1440 | float32 | 3        | 0.50    | normal        | scalar | 0.338486 | 0.028191   | 12.01x  | -91.67%        |
| 2560x1440 | float32 | 3        | 0.50    | normal        | sse42  | 0.338486 | 0.012905   | 26.23x  | -96.19%        |
| 2560x1440 | float32 | 3        | 0.50    | normal        | avx2   | 0.338486 | 0.008902   | 38.02x  | -97.37%        |
| 2560x1440 | float32 | 3        | 0.50    | soft_light    | scalar | 0.438030 | 0.035138   | 12.47x  | -91.98%        |
| 2560x1440 | float32 | 3        | 0.50    | soft_light    | sse42  | 0.438030 | 0.008706   | 50.31x  | -98.01%        |
| 2560x1440 | float32 | 3        | 0.50    | soft_light    | avx2   | 0.438030 | 0.007621   | 57.48x  | -98.26%        |
| 2560x1440 | float32 | 3        | 0.50    | lighten_only  | scalar | 0.324665 | 0.044749   | 7.26x   | -86.22%        |
| 2560x1440 | float32 | 3        | 0.50    | lighten_only  | sse42  | 0.324665 | 0.010158   | 31.96x  | -96.87%        |
| 2560x1440 | float32 | 3        | 0.50    | lighten_only  | avx2   | 0.324665 | 0.010061   | 32.27x  | -96.90%        |
| 2560x1440 | float32 | 3        | 0.50    | screen        | scalar | 0.363268 | 0.036996   | 9.82x   | -89.82%        |
| 2560x1440 | float32 | 3        | 0.50    | screen        | sse42  | 0.363268 | 0.009731   | 37.33x  | -97.32%        |
| 2560x1440 | float32 | 3        | 0.50    | screen        | avx2   | 0.363268 | 0.009093   | 39.95x  | -97.50%        |
| 2560x1440 | float32 | 3        | 0.50    | dodge         | scalar | 0.362960 | 0.038982   | 9.31x   | -89.26%        |
| 2560x1440 | float32 | 3        | 0.50    | dodge         | sse42  | 0.362960 | 0.010291   | 35.27x  | -97.16%        |
| 2560x1440 | float32 | 3        | 0.50    | dodge         | avx2   | 0.362960 | 0.008926   | 40.66x  | -97.54%        |
| 2560x1440 | float32 | 3        | 0.50    | addition      | scalar | 0.340792 | 0.090675   | 3.76x   | -73.39%        |
| 2560x1440 | float32 | 3        | 0.50    | addition      | sse42  | 0.340792 | 0.009096   | 37.47x  | -97.33%        |
| 2560x1440 | float32 | 3        | 0.50    | addition      | avx2   | 0.340792 | 0.007053   | 48.32x  | -97.93%        |
| 2560x1440 | float32 | 3        | 0.50    | darken_only   | scalar | 0.338686 | 0.046898   | 7.22x   | -86.15%        |
| 2560x1440 | float32 | 3        | 0.50    | darken_only   | sse42  | 0.338686 | 0.009533   | 35.53x  | -97.19%        |
| 2560x1440 | float32 | 3        | 0.50    | darken_only   | avx2   | 0.338686 | 0.007015   | 48.28x  | -97.93%        |
| 2560x1440 | float32 | 3        | 0.50    | multiply      | scalar | 0.349767 | 0.033112   | 10.56x  | -90.53%        |
| 2560x1440 | float32 | 3        | 0.50    | multiply      | sse42  | 0.349767 | 0.009531   | 36.70x  | -97.28%        |
| 2560x1440 | float32 | 3        | 0.50    | multiply      | avx2   | 0.349767 | 0.008639   | 40.49x  | -97.53%        |
| 2560x1440 | float32 | 3        | 0.50    | hard_light    | scalar | 0.498493 | 0.102288   | 4.87x   | -79.48%        |
| 2560x1440 | float32 | 3        | 0.50    | hard_light    | sse42  | 0.498493 | 0.011994   | 41.56x  | -97.59%        |
| 2560x1440 | float32 | 3        | 0.50    | hard_light    | avx2   | 0.498493 | 0.009884   | 50.44x  | -98.02%        |
| 2560x1440 | float32 | 3        | 0.50    | difference    | scalar | 0.462928 | 0.035680   | 12.97x  | -92.29%        |
| 2560x1440 | float32 | 3        | 0.50    | difference    | sse42  | 0.462928 | 0.013020   | 35.56x  | -97.19%        |
| 2560x1440 | float32 | 3        | 0.50    | difference    | avx2   | 0.462928 | 0.009890   | 46.81x  | -97.86%        |
| 2560x1440 | float32 | 3        | 0.50    | subtract      | scalar | 0.357105 | 0.041864   | 8.53x   | -88.28%        |
| 2560x1440 | float32 | 3        | 0.50    | subtract      | sse42  | 0.357105 | 0.008979   | 39.77x  | -97.49%        |
| 2560x1440 | float32 | 3        | 0.50    | subtract      | avx2   | 0.357105 | 0.008246   | 43.30x  | -97.69%        |
| 2560x1440 | float32 | 3        | 0.50    | grain_extract | scalar | 0.382440 | 0.066065   | 5.79x   | -82.73%        |
| 2560x1440 | float32 | 3        | 0.50    | grain_extract | sse42  | 0.382440 | 0.008302   | 46.07x  | -97.83%        |
| 2560x1440 | float32 | 3        | 0.50    | grain_extract | avx2   | 0.382440 | 0.007056   | 54.20x  | -98.16%        |
| 2560x1440 | float32 | 3        | 0.50    | grain_merge   | scalar | 0.326640 | 0.056918   | 5.74x   | -82.57%        |
| 2560x1440 | float32 | 3        | 0.50    | grain_merge   | sse42  | 0.326640 | 0.008358   | 39.08x  | -97.44%        |
| 2560x1440 | float32 | 3        | 0.50    | grain_merge   | avx2   | 0.326640 | 0.007835   | 41.69x  | -97.60%        |
| 2560x1440 | float32 | 3        | 0.50    | divide        | scalar | 0.327691 | 0.034782   | 9.42x   | -89.39%        |
| 2560x1440 | float32 | 3        | 0.50    | divide        | sse42  | 0.327691 | 0.009200   | 35.62x  | -97.19%        |
| 2560x1440 | float32 | 3        | 0.50    | divide        | avx2   | 0.327691 | 0.007407   | 44.24x  | -97.74%        |
| 2560x1440 | float32 | 3        | 0.50    | overlay       | scalar | 0.430782 | 0.094074   | 4.58x   | -78.16%        |
| 2560x1440 | float32 | 3        | 0.50    | overlay       | sse42  | 0.430782 | 0.010517   | 40.96x  | -97.56%        |
| 2560x1440 | float32 | 3        | 0.50    | overlay       | avx2   | 0.430782 | 0.007787   | 55.32x  | -98.19%        |
| 2560x1440 | float32 | 4        | 0.50    | normal        | scalar | 0.247482 | 0.041107   | 6.02x   | -83.39%        |
| 2560x1440 | float32 | 4        | 0.50    | normal        | sse42  | 0.247482 | 0.017781   | 13.92x  | -92.82%        |
| 2560x1440 | float32 | 4        | 0.50    | normal        | avx2   | 0.247482 | 0.019084   | 12.97x  | -92.29%        |
| 2560x1440 | float32 | 4        | 0.50    | soft_light    | scalar | 0.345799 | 0.046313   | 7.47x   | -86.61%        |
| 2560x1440 | float32 | 4        | 0.50    | soft_light    | sse42  | 0.345799 | 0.018327   | 18.87x  | -94.70%        |
| 2560x1440 | float32 | 4        | 0.50    | soft_light    | avx2   | 0.345799 | 0.017976   | 19.24x  | -94.80%        |
| 2560x1440 | float32 | 4        | 0.50    | lighten_only  | scalar | 0.244585 | 0.049858   | 4.91x   | -79.62%        |
| 2560x1440 | float32 | 4        | 0.50    | lighten_only  | sse42  | 0.244585 | 0.017679   | 13.84x  | -92.77%        |
| 2560x1440 | float32 | 4        | 0.50    | lighten_only  | avx2   | 0.244585 | 0.017121   | 14.29x  | -93.00%        |
| 2560x1440 | float32 | 4        | 0.50    | screen        | scalar | 0.270831 | 0.044904   | 6.03x   | -83.42%        |
| 2560x1440 | float32 | 4        | 0.50    | screen        | sse42  | 0.270831 | 0.017186   | 15.76x  | -93.65%        |
| 2560x1440 | float32 | 4        | 0.50    | screen        | avx2   | 0.270831 | 0.018435   | 14.69x  | -93.19%        |
| 2560x1440 | float32 | 4        | 0.50    | dodge         | scalar | 0.265395 | 0.049086   | 5.41x   | -81.50%        |
| 2560x1440 | float32 | 4        | 0.50    | dodge         | sse42  | 0.265395 | 0.019750   | 13.44x  | -92.56%        |
| 2560x1440 | float32 | 4        | 0.50    | dodge         | avx2   | 0.265395 | 0.018606   | 14.26x  | -92.99%        |
| 2560x1440 | float32 | 4        | 0.50    | addition      | scalar | 0.251135 | 0.081796   | 3.07x   | -67.43%        |
| 2560x1440 | float32 | 4        | 0.50    | addition      | sse42  | 0.251135 | 0.018239   | 13.77x  | -92.74%        |
| 2560x1440 | float32 | 4        | 0.50    | addition      | avx2   | 0.251135 | 0.018350   | 13.69x  | -92.69%        |
| 2560x1440 | float32 | 4        | 0.50    | darken_only   | scalar | 0.243990 | 0.049418   | 4.94x   | -79.75%        |
| 2560x1440 | float32 | 4        | 0.50    | darken_only   | sse42  | 0.243990 | 0.017941   | 13.60x  | -92.65%        |
| 2560x1440 | float32 | 4        | 0.50    | darken_only   | avx2   | 0.243990 | 0.018988   | 12.85x  | -92.22%        |
| 2560x1440 | float32 | 4        | 0.50    | multiply      | scalar | 0.252425 | 0.044184   | 5.71x   | -82.50%        |
| 2560x1440 | float32 | 4        | 0.50    | multiply      | sse42  | 0.252425 | 0.019216   | 13.14x  | -92.39%        |
| 2560x1440 | float32 | 4        | 0.50    | multiply      | avx2   | 0.252425 | 0.017669   | 14.29x  | -93.00%        |
| 2560x1440 | float32 | 4        | 0.50    | hard_light    | scalar | 0.407589 | 0.110593   | 3.69x   | -72.87%        |
| 2560x1440 | float32 | 4        | 0.50    | hard_light    | sse42  | 0.407589 | 0.019272   | 21.15x  | -95.27%        |
| 2560x1440 | float32 | 4        | 0.50    | hard_light    | avx2   | 0.407589 | 0.017489   | 23.31x  | -95.71%        |
| 2560x1440 | float32 | 4        | 0.50    | difference    | scalar | 0.346795 | 0.043899   | 7.90x   | -87.34%        |
| 2560x1440 | float32 | 4        | 0.50    | difference    | sse42  | 0.346795 | 0.016854   | 20.58x  | -95.14%        |
| 2560x1440 | float32 | 4        | 0.50    | difference    | avx2   | 0.346795 | 0.019188   | 18.07x  | -94.47%        |
| 2560x1440 | float32 | 4        | 0.50    | subtract      | scalar | 0.248777 | 0.054361   | 4.58x   | -78.15%        |
| 2560x1440 | float32 | 4        | 0.50    | subtract      | sse42  | 0.248777 | 0.019518   | 12.75x  | -92.15%        |
| 2560x1440 | float32 | 4        | 0.50    | subtract      | avx2   | 0.248777 | 0.017977   | 13.84x  | -92.77%        |
| 2560x1440 | float32 | 4        | 0.50    | grain_extract | scalar | 0.264149 | 0.066979   | 3.94x   | -74.64%        |
| 2560x1440 | float32 | 4        | 0.50    | grain_extract | sse42  | 0.264149 | 0.019971   | 13.23x  | -92.44%        |
| 2560x1440 | float32 | 4        | 0.50    | grain_extract | avx2   | 0.264149 | 0.017537   | 15.06x  | -93.36%        |
| 2560x1440 | float32 | 4        | 0.50    | grain_merge   | scalar | 0.264921 | 0.066238   | 4.00x   | -75.00%        |
| 2560x1440 | float32 | 4        | 0.50    | grain_merge   | sse42  | 0.264921 | 0.018152   | 14.59x  | -93.15%        |
| 2560x1440 | float32 | 4        | 0.50    | grain_merge   | avx2   | 0.264921 | 0.018749   | 14.13x  | -92.92%        |
| 2560x1440 | float32 | 4        | 0.50    | divide        | scalar | 0.270001 | 0.047039   | 5.74x   | -82.58%        |
| 2560x1440 | float32 | 4        | 0.50    | divide        | sse42  | 0.270001 | 0.017908   | 15.08x  | -93.37%        |
| 2560x1440 | float32 | 4        | 0.50    | divide        | avx2   | 0.270001 | 0.017958   | 15.03x  | -93.35%        |
| 2560x1440 | float32 | 4        | 0.50    | overlay       | scalar | 0.365537 | 0.104447   | 3.50x   | -71.43%        |
| 2560x1440 | float32 | 4        | 0.50    | overlay       | sse42  | 0.365537 | 0.019037   | 19.20x  | -94.79%        |
| 2560x1440 | float32 | 4        | 0.50    | overlay       | avx2   | 0.365537 | 0.017640   | 20.72x  | -95.17%        |
| 3840x2160 | uint8   | 3        | 0.50    | normal        | scalar | 0.773593 | 0.205890   | 3.76x   | -73.39%        |
| 3840x2160 | uint8   | 3        | 0.50    | normal        | sse42  | 0.773593 | 0.088640   | 8.73x   | -88.54%        |
| 3840x2160 | uint8   | 3        | 0.50    | normal        | avx2   | 0.773593 | 0.088402   | 8.75x   | -88.57%        |
| 3840x2160 | uint8   | 3        | 0.50    | soft_light    | scalar | 1.009634 | 0.230943   | 4.37x   | -77.13%        |
| 3840x2160 | uint8   | 3        | 0.50    | soft_light    | sse42  | 1.009634 | 0.101244   | 9.97x   | -89.97%        |
| 3840x2160 | uint8   | 3        | 0.50    | soft_light    | avx2   | 1.009634 | 0.089546   | 11.27x  | -91.13%        |
| 3840x2160 | uint8   | 3        | 0.50    | lighten_only  | scalar | 0.774024 | 0.260008   | 2.98x   | -66.41%        |
| 3840x2160 | uint8   | 3        | 0.50    | lighten_only  | sse42  | 0.774024 | 0.103296   | 7.49x   | -86.65%        |
| 3840x2160 | uint8   | 3        | 0.50    | lighten_only  | avx2   | 0.774024 | 0.090656   | 8.54x   | -88.29%        |
| 3840x2160 | uint8   | 3        | 0.50    | screen        | scalar | 0.823938 | 0.232282   | 3.55x   | -71.81%        |
| 3840x2160 | uint8   | 3        | 0.50    | screen        | sse42  | 0.823938 | 0.102505   | 8.04x   | -87.56%        |
| 3840x2160 | uint8   | 3        | 0.50    | screen        | avx2   | 0.823938 | 0.094159   | 8.75x   | -88.57%        |
| 3840x2160 | uint8   | 3        | 0.50    | dodge         | scalar | 0.869022 | 0.249055   | 3.49x   | -71.34%        |
| 3840x2160 | uint8   | 3        | 0.50    | dodge         | sse42  | 0.869022 | 0.104118   | 8.35x   | -88.02%        |
| 3840x2160 | uint8   | 3        | 0.50    | dodge         | avx2   | 0.869022 | 0.098629   | 8.81x   | -88.65%        |
| 3840x2160 | uint8   | 3        | 0.50    | addition      | scalar | 0.848156 | 0.338085   | 2.51x   | -60.14%        |
| 3840x2160 | uint8   | 3        | 0.50    | addition      | sse42  | 0.848156 | 0.104296   | 8.13x   | -87.70%        |
| 3840x2160 | uint8   | 3        | 0.50    | addition      | avx2   | 0.848156 | 0.090680   | 9.35x   | -89.31%        |
| 3840x2160 | uint8   | 3        | 0.50    | darken_only   | scalar | 0.792091 | 0.260904   | 3.04x   | -67.06%        |
| 3840x2160 | uint8   | 3        | 0.50    | darken_only   | sse42  | 0.792091 | 0.099989   | 7.92x   | -87.38%        |
| 3840x2160 | uint8   | 3        | 0.50    | darken_only   | avx2   | 0.792091 | 0.096438   | 8.21x   | -87.82%        |
| 3840x2160 | uint8   | 3        | 0.50    | multiply      | scalar | 0.824378 | 0.253209   | 3.26x   | -69.28%        |
| 3840x2160 | uint8   | 3        | 0.50    | multiply      | sse42  | 0.824378 | 0.101705   | 8.11x   | -87.66%        |
| 3840x2160 | uint8   | 3        | 0.50    | multiply      | avx2   | 0.824378 | 0.090064   | 9.15x   | -89.07%        |
| 3840x2160 | uint8   | 3        | 0.50    | hard_light    | scalar | 1.107955 | 0.391686   | 2.83x   | -64.65%        |
| 3840x2160 | uint8   | 3        | 0.50    | hard_light    | sse42  | 1.107955 | 0.105551   | 10.50x  | -90.47%        |
| 3840x2160 | uint8   | 3        | 0.50    | hard_light    | avx2   | 1.107955 | 0.093644   | 11.83x  | -91.55%        |
| 3840x2160 | uint8   | 3        | 0.50    | difference    | scalar | 1.015558 | 0.229006   | 4.43x   | -77.45%        |
| 3840x2160 | uint8   | 3        | 0.50    | difference    | sse42  | 1.015558 | 0.098043   | 10.36x  | -90.35%        |
| 3840x2160 | uint8   | 3        | 0.50    | difference    | avx2   | 1.015558 | 0.090995   | 11.16x  | -91.04%        |
| 3840x2160 | uint8   | 3        | 0.50    | subtract      | scalar | 0.778203 | 0.218147   | 3.57x   | -71.97%        |
| 3840x2160 | uint8   | 3        | 0.50    | subtract      | sse42  | 0.778203 | 0.101428   | 7.67x   | -86.97%        |
| 3840x2160 | uint8   | 3        | 0.50    | subtract      | avx2   | 0.778203 | 0.090712   | 8.58x   | -88.34%        |
| 3840x2160 | uint8   | 3        | 0.50    | grain_extract | scalar | 0.801544 | 0.282283   | 2.84x   | -64.78%        |
| 3840x2160 | uint8   | 3        | 0.50    | grain_extract | sse42  | 0.801544 | 0.101279   | 7.91x   | -87.36%        |
| 3840x2160 | uint8   | 3        | 0.50    | grain_extract | avx2   | 0.801544 | 0.091428   | 8.77x   | -88.59%        |
| 3840x2160 | uint8   | 3        | 0.50    | grain_merge   | scalar | 0.806443 | 0.279216   | 2.89x   | -65.38%        |
| 3840x2160 | uint8   | 3        | 0.50    | grain_merge   | sse42  | 0.806443 | 0.102134   | 7.90x   | -87.34%        |
| 3840x2160 | uint8   | 3        | 0.50    | grain_merge   | avx2   | 0.806443 | 0.091160   | 8.85x   | -88.70%        |
| 3840x2160 | uint8   | 3        | 0.50    | divide        | scalar | 0.816482 | 0.236195   | 3.46x   | -71.07%        |
| 3840x2160 | uint8   | 3        | 0.50    | divide        | sse42  | 0.816482 | 0.103729   | 7.87x   | -87.30%        |
| 3840x2160 | uint8   | 3        | 0.50    | divide        | avx2   | 0.816482 | 0.092667   | 8.81x   | -88.65%        |
| 3840x2160 | uint8   | 3        | 0.50    | overlay       | scalar | 1.039519 | 0.380434   | 2.73x   | -63.40%        |
| 3840x2160 | uint8   | 3        | 0.50    | overlay       | sse42  | 1.039519 | 0.103298   | 10.06x  | -90.06%        |
| 3840x2160 | uint8   | 3        | 0.50    | overlay       | avx2   | 1.039519 | 0.092779   | 11.20x  | -91.07%        |
| 3840x2160 | uint8   | 4        | 0.50    | normal        | scalar | 0.585558 | 0.170780   | 3.43x   | -70.83%        |
| 3840x2160 | uint8   | 4        | 0.50    | normal        | sse42  | 0.585558 | 0.022255   | 26.31x  | -96.20%        |
| 3840x2160 | uint8   | 4        | 0.50    | normal        | avx2   | 0.585558 | 0.020129   | 29.09x  | -96.56%        |
| 3840x2160 | uint8   | 4        | 0.50    | soft_light    | scalar | 0.813257 | 0.214934   | 3.78x   | -73.57%        |
| 3840x2160 | uint8   | 4        | 0.50    | soft_light    | sse42  | 0.813257 | 0.028308   | 28.73x  | -96.52%        |
| 3840x2160 | uint8   | 4        | 0.50    | soft_light    | avx2   | 0.813257 | 0.026189   | 31.05x  | -96.78%        |
| 3840x2160 | uint8   | 4        | 0.50    | lighten_only  | scalar | 0.569797 | 0.221973   | 2.57x   | -61.04%        |
| 3840x2160 | uint8   | 4        | 0.50    | lighten_only  | sse42  | 0.569797 | 0.025378   | 22.45x  | -95.55%        |
| 3840x2160 | uint8   | 4        | 0.50    | lighten_only  | avx2   | 0.569797 | 0.023892   | 23.85x  | -95.81%        |
| 3840x2160 | uint8   | 4        | 0.50    | screen        | scalar | 0.617556 | 0.205694   | 3.00x   | -66.69%        |
| 3840x2160 | uint8   | 4        | 0.50    | screen        | sse42  | 0.617556 | 0.027771   | 22.24x  | -95.50%        |
| 3840x2160 | uint8   | 4        | 0.50    | screen        | avx2   | 0.617556 | 0.024450   | 25.26x  | -96.04%        |
| 3840x2160 | uint8   | 4        | 0.50    | dodge         | scalar | 0.620187 | 0.218166   | 2.84x   | -64.82%        |
| 3840x2160 | uint8   | 4        | 0.50    | dodge         | sse42  | 0.620187 | 0.030355   | 20.43x  | -95.11%        |
| 3840x2160 | uint8   | 4        | 0.50    | dodge         | avx2   | 0.620187 | 0.026091   | 23.77x  | -95.79%        |
| 3840x2160 | uint8   | 4        | 0.50    | addition      | scalar | 0.593498 | 0.257994   | 2.30x   | -56.53%        |
| 3840x2160 | uint8   | 4        | 0.50    | addition      | sse42  | 0.593498 | 0.033556   | 17.69x  | -94.35%        |
| 3840x2160 | uint8   | 4        | 0.50    | addition      | avx2   | 0.593498 | 0.025008   | 23.73x  | -95.79%        |
| 3840x2160 | uint8   | 4        | 0.50    | darken_only   | scalar | 0.572909 | 0.223689   | 2.56x   | -60.96%        |
| 3840x2160 | uint8   | 4        | 0.50    | darken_only   | sse42  | 0.572909 | 0.025210   | 22.73x  | -95.60%        |
| 3840x2160 | uint8   | 4        | 0.50    | darken_only   | avx2   | 0.572909 | 0.023908   | 23.96x  | -95.83%        |
| 3840x2160 | uint8   | 4        | 0.50    | multiply      | scalar | 0.592459 | 0.207613   | 2.85x   | -64.96%        |
| 3840x2160 | uint8   | 4        | 0.50    | multiply      | sse42  | 0.592459 | 0.025654   | 23.09x  | -95.67%        |
| 3840x2160 | uint8   | 4        | 0.50    | multiply      | avx2   | 0.592459 | 0.024034   | 24.65x  | -95.94%        |
| 3840x2160 | uint8   | 4        | 0.50    | hard_light    | scalar | 0.910808 | 0.339167   | 2.69x   | -62.76%        |
| 3840x2160 | uint8   | 4        | 0.50    | hard_light    | sse42  | 0.910808 | 0.030233   | 30.13x  | -96.68%        |
| 3840x2160 | uint8   | 4        | 0.50    | hard_light    | avx2   | 0.910808 | 0.025407   | 35.85x  | -97.21%        |
| 3840x2160 | uint8   | 4        | 0.50    | difference    | scalar | 0.808489 | 0.205266   | 3.94x   | -74.61%        |
| 3840x2160 | uint8   | 4        | 0.50    | difference    | sse42  | 0.808489 | 0.025578   | 31.61x  | -96.84%        |
| 3840x2160 | uint8   | 4        | 0.50    | difference    | avx2   | 0.808489 | 0.023691   | 34.13x  | -97.07%        |
| 3840x2160 | uint8   | 4        | 0.50    | subtract      | scalar | 0.590641 | 0.197611   | 2.99x   | -66.54%        |
| 3840x2160 | uint8   | 4        | 0.50    | subtract      | sse42  | 0.590641 | 0.034033   | 17.36x  | -94.24%        |
| 3840x2160 | uint8   | 4        | 0.50    | subtract      | avx2   | 0.590641 | 0.028115   | 21.01x  | -95.24%        |
| 3840x2160 | uint8   | 4        | 0.50    | grain_extract | scalar | 0.612587 | 0.250089   | 2.45x   | -59.17%        |
| 3840x2160 | uint8   | 4        | 0.50    | grain_extract | sse42  | 0.612587 | 0.027108   | 22.60x  | -95.57%        |
| 3840x2160 | uint8   | 4        | 0.50    | grain_extract | avx2   | 0.612587 | 0.025422   | 24.10x  | -95.85%        |
| 3840x2160 | uint8   | 4        | 0.50    | grain_merge   | scalar | 0.623113 | 0.249314   | 2.50x   | -59.99%        |
| 3840x2160 | uint8   | 4        | 0.50    | grain_merge   | sse42  | 0.623113 | 0.027288   | 22.84x  | -95.62%        |
| 3840x2160 | uint8   | 4        | 0.50    | grain_merge   | avx2   | 0.623113 | 0.025636   | 24.31x  | -95.89%        |
| 3840x2160 | uint8   | 4        | 0.50    | divide        | scalar | 0.629194 | 0.216029   | 2.91x   | -65.67%        |
| 3840x2160 | uint8   | 4        | 0.50    | divide        | sse42  | 0.629194 | 0.028878   | 21.79x  | -95.41%        |
| 3840x2160 | uint8   | 4        | 0.50    | divide        | avx2   | 0.629194 | 0.025190   | 24.98x  | -96.00%        |
| 3840x2160 | uint8   | 4        | 0.50    | overlay       | scalar | 0.860772 | 0.335322   | 2.57x   | -61.04%        |
| 3840x2160 | uint8   | 4        | 0.50    | overlay       | sse42  | 0.860772 | 0.029736   | 28.95x  | -96.55%        |
| 3840x2160 | uint8   | 4        | 0.50    | overlay       | avx2   | 0.860772 | 0.025892   | 33.24x  | -96.99%        |
| 3840x2160 | float32 | 3        | 0.50    | normal        | scalar | 0.687004 | 0.073983   | 9.29x   | -89.23%        |
| 3840x2160 | float32 | 3        | 0.50    | normal        | sse42  | 0.687004 | 0.040131   | 17.12x  | -94.16%        |
| 3840x2160 | float32 | 3        | 0.50    | normal        | avx2   | 0.687004 | 0.030225   | 22.73x  | -95.60%        |
| 3840x2160 | float32 | 3        | 0.50    | soft_light    | scalar | 0.894822 | 0.092638   | 9.66x   | -89.65%        |
| 3840x2160 | float32 | 3        | 0.50    | soft_light    | sse42  | 0.894822 | 0.027946   | 32.02x  | -96.88%        |
| 3840x2160 | float32 | 3        | 0.50    | soft_light    | avx2   | 0.894822 | 0.025470   | 35.13x  | -97.15%        |
| 3840x2160 | float32 | 3        | 0.50    | lighten_only  | scalar | 0.676375 | 0.110970   | 6.10x   | -83.59%        |
| 3840x2160 | float32 | 3        | 0.50    | lighten_only  | sse42  | 0.676375 | 0.027699   | 24.42x  | -95.90%        |
| 3840x2160 | float32 | 3        | 0.50    | lighten_only  | avx2   | 0.676375 | 0.025796   | 26.22x  | -96.19%        |
| 3840x2160 | float32 | 3        | 0.50    | screen        | scalar | 0.706561 | 0.083843   | 8.43x   | -88.13%        |
| 3840x2160 | float32 | 3        | 0.50    | screen        | sse42  | 0.706561 | 0.027634   | 25.57x  | -96.09%        |
| 3840x2160 | float32 | 3        | 0.50    | screen        | avx2   | 0.706561 | 0.024567   | 28.76x  | -96.52%        |
| 3840x2160 | float32 | 3        | 0.50    | dodge         | scalar | 0.708613 | 0.094783   | 7.48x   | -86.62%        |
| 3840x2160 | float32 | 3        | 0.50    | dodge         | sse42  | 0.708613 | 0.028263   | 25.07x  | -96.01%        |
| 3840x2160 | float32 | 3        | 0.50    | dodge         | avx2   | 0.708613 | 0.025843   | 27.42x  | -96.35%        |
| 3840x2160 | float32 | 3        | 0.50    | addition      | scalar | 0.679294 | 0.210632   | 3.23x   | -68.99%        |
| 3840x2160 | float32 | 3        | 0.50    | addition      | sse42  | 0.679294 | 0.028120   | 24.16x  | -95.86%        |
| 3840x2160 | float32 | 3        | 0.50    | addition      | avx2   | 0.679294 | 0.027036   | 25.13x  | -96.02%        |
| 3840x2160 | float32 | 3        | 0.50    | darken_only   | scalar | 0.669706 | 0.108856   | 6.15x   | -83.75%        |
| 3840x2160 | float32 | 3        | 0.50    | darken_only   | sse42  | 0.669706 | 0.029700   | 22.55x  | -95.57%        |
| 3840x2160 | float32 | 3        | 0.50    | darken_only   | avx2   | 0.669706 | 0.025388   | 26.38x  | -96.21%        |
| 3840x2160 | float32 | 3        | 0.50    | multiply      | scalar | 0.676629 | 0.083621   | 8.09x   | -87.64%        |
| 3840x2160 | float32 | 3        | 0.50    | multiply      | sse42  | 0.676629 | 0.028408   | 23.82x  | -95.80%        |
| 3840x2160 | float32 | 3        | 0.50    | multiply      | avx2   | 0.676629 | 0.024854   | 27.22x  | -96.33%        |
| 3840x2160 | float32 | 3        | 0.50    | hard_light    | scalar | 1.014871 | 0.237860   | 4.27x   | -76.56%        |
| 3840x2160 | float32 | 3        | 0.50    | hard_light    | sse42  | 1.014871 | 0.030567   | 33.20x  | -96.99%        |
| 3840x2160 | float32 | 3        | 0.50    | hard_light    | avx2   | 1.014871 | 0.025942   | 39.12x  | -97.44%        |
| 3840x2160 | float32 | 3        | 0.50    | difference    | scalar | 0.900546 | 0.083749   | 10.75x  | -90.70%        |
| 3840x2160 | float32 | 3        | 0.50    | difference    | sse42  | 0.900546 | 0.034210   | 26.32x  | -96.20%        |
| 3840x2160 | float32 | 3        | 0.50    | difference    | avx2   | 0.900546 | 0.024905   | 36.16x  | -97.23%        |
| 3840x2160 | float32 | 3        | 0.50    | subtract      | scalar | 0.682183 | 0.099288   | 6.87x   | -85.45%        |
| 3840x2160 | float32 | 3        | 0.50    | subtract      | sse42  | 0.682183 | 0.027095   | 25.18x  | -96.03%        |
| 3840x2160 | float32 | 3        | 0.50    | subtract      | avx2   | 0.682183 | 0.025988   | 26.25x  | -96.19%        |
| 3840x2160 | float32 | 3        | 0.50    | grain_extract | scalar | 0.697547 | 0.141906   | 4.92x   | -79.66%        |
| 3840x2160 | float32 | 3        | 0.50    | grain_extract | sse42  | 0.697547 | 0.027581   | 25.29x  | -96.05%        |
| 3840x2160 | float32 | 3        | 0.50    | grain_extract | avx2   | 0.697547 | 0.025667   | 27.18x  | -96.32%        |
| 3840x2160 | float32 | 3        | 0.50    | grain_merge   | scalar | 0.710500 | 0.137343   | 5.17x   | -80.67%        |
| 3840x2160 | float32 | 3        | 0.50    | grain_merge   | sse42  | 0.710500 | 0.028490   | 24.94x  | -95.99%        |
| 3840x2160 | float32 | 3        | 0.50    | grain_merge   | avx2   | 0.710500 | 0.025844   | 27.49x  | -96.36%        |
| 3840x2160 | float32 | 3        | 0.50    | divide        | scalar | 0.708737 | 0.088583   | 8.00x   | -87.50%        |
| 3840x2160 | float32 | 3        | 0.50    | divide        | sse42  | 0.708737 | 0.027977   | 25.33x  | -96.05%        |
| 3840x2160 | float32 | 3        | 0.50    | divide        | avx2   | 0.708737 | 0.025274   | 28.04x  | -96.43%        |
| 3840x2160 | float32 | 3        | 0.50    | overlay       | scalar | 0.922272 | 0.216147   | 4.27x   | -76.56%        |
| 3840x2160 | float32 | 3        | 0.50    | overlay       | sse42  | 0.922272 | 0.028167   | 32.74x  | -96.95%        |
| 3840x2160 | float32 | 3        | 0.50    | overlay       | avx2   | 0.922272 | 0.026553   | 34.73x  | -97.12%        |
| 3840x2160 | float32 | 4        | 0.50    | normal        | scalar | 0.555923 | 0.090383   | 6.15x   | -83.74%        |
| 3840x2160 | float32 | 4        | 0.50    | normal        | sse42  | 0.555923 | 0.035104   | 15.84x  | -93.69%        |
| 3840x2160 | float32 | 4        | 0.50    | normal        | avx2   | 0.555923 | 0.036845   | 15.09x  | -93.37%        |
| 3840x2160 | float32 | 4        | 0.50    | soft_light    | scalar | 0.743759 | 0.103147   | 7.21x   | -86.13%        |
| 3840x2160 | float32 | 4        | 0.50    | soft_light    | sse42  | 0.743759 | 0.038183   | 19.48x  | -94.87%        |
| 3840x2160 | float32 | 4        | 0.50    | soft_light    | avx2   | 0.743759 | 0.038329   | 19.40x  | -94.85%        |
| 3840x2160 | float32 | 4        | 0.50    | lighten_only  | scalar | 0.518030 | 0.109622   | 4.73x   | -78.84%        |
| 3840x2160 | float32 | 4        | 0.50    | lighten_only  | sse42  | 0.518030 | 0.040253   | 12.87x  | -92.23%        |
| 3840x2160 | float32 | 4        | 0.50    | lighten_only  | avx2   | 0.518030 | 0.038290   | 13.53x  | -92.61%        |
| 3840x2160 | float32 | 4        | 0.50    | screen        | scalar | 0.559820 | 0.098862   | 5.66x   | -82.34%        |
| 3840x2160 | float32 | 4        | 0.50    | screen        | sse42  | 0.559820 | 0.037861   | 14.79x  | -93.24%        |
| 3840x2160 | float32 | 4        | 0.50    | screen        | avx2   | 0.559820 | 0.039967   | 14.01x  | -92.86%        |
| 3840x2160 | float32 | 4        | 0.50    | dodge         | scalar | 0.602509 | 0.111041   | 5.43x   | -81.57%        |
| 3840x2160 | float32 | 4        | 0.50    | dodge         | sse42  | 0.602509 | 0.042052   | 14.33x  | -93.02%        |
| 3840x2160 | float32 | 4        | 0.50    | dodge         | avx2   | 0.602509 | 0.038257   | 15.75x  | -93.65%        |
| 3840x2160 | float32 | 4        | 0.50    | addition      | scalar | 0.566766 | 0.190183   | 2.98x   | -66.44%        |
| 3840x2160 | float32 | 4        | 0.50    | addition      | sse42  | 0.566766 | 0.045532   | 12.45x  | -91.97%        |
| 3840x2160 | float32 | 4        | 0.50    | addition      | avx2   | 0.566766 | 0.046675   | 12.14x  | -91.76%        |
| 3840x2160 | float32 | 4        | 0.50    | darken_only   | scalar | 0.623307 | 0.120117   | 5.19x   | -80.73%        |
| 3840x2160 | float32 | 4        | 0.50    | darken_only   | sse42  | 0.623307 | 0.044849   | 13.90x  | -92.80%        |
| 3840x2160 | float32 | 4        | 0.50    | darken_only   | avx2   | 0.623307 | 0.040911   | 15.24x  | -93.44%        |
| 3840x2160 | float32 | 4        | 0.50    | multiply      | scalar | 0.559049 | 0.097246   | 5.75x   | -82.61%        |
| 3840x2160 | float32 | 4        | 0.50    | multiply      | sse42  | 0.559049 | 0.037818   | 14.78x  | -93.24%        |
| 3840x2160 | float32 | 4        | 0.50    | multiply      | avx2   | 0.559049 | 0.037052   | 15.09x  | -93.37%        |
| 3840x2160 | float32 | 4        | 0.50    | hard_light    | scalar | 0.913539 | 0.253635   | 3.60x   | -72.24%        |
| 3840x2160 | float32 | 4        | 0.50    | hard_light    | sse42  | 0.913539 | 0.043829   | 20.84x  | -95.20%        |
| 3840x2160 | float32 | 4        | 0.50    | hard_light    | avx2   | 0.913539 | 0.037598   | 24.30x  | -95.88%        |
| 3840x2160 | float32 | 4        | 0.50    | difference    | scalar | 0.800629 | 0.098506   | 8.13x   | -87.70%        |
| 3840x2160 | float32 | 4        | 0.50    | difference    | sse42  | 0.800629 | 0.037104   | 21.58x  | -95.37%        |
| 3840x2160 | float32 | 4        | 0.50    | difference    | avx2   | 0.800629 | 0.041898   | 19.11x  | -94.77%        |
| 3840x2160 | float32 | 4        | 0.50    | subtract      | scalar | 0.567607 | 0.122466   | 4.63x   | -78.42%        |
| 3840x2160 | float32 | 4        | 0.50    | subtract      | sse42  | 0.567607 | 0.041824   | 13.57x  | -92.63%        |
| 3840x2160 | float32 | 4        | 0.50    | subtract      | avx2   | 0.567607 | 0.041247   | 13.76x  | -92.73%        |
| 3840x2160 | float32 | 4        | 0.50    | grain_extract | scalar | 0.563396 | 0.150405   | 3.75x   | -73.30%        |
| 3840x2160 | float32 | 4        | 0.50    | grain_extract | sse42  | 0.563396 | 0.040760   | 13.82x  | -92.77%        |
| 3840x2160 | float32 | 4        | 0.50    | grain_extract | avx2   | 0.563396 | 0.037819   | 14.90x  | -93.29%        |
| 3840x2160 | float32 | 4        | 0.50    | grain_merge   | scalar | 0.590499 | 0.149181   | 3.96x   | -74.74%        |
| 3840x2160 | float32 | 4        | 0.50    | grain_merge   | sse42  | 0.590499 | 0.038353   | 15.40x  | -93.50%        |
| 3840x2160 | float32 | 4        | 0.50    | grain_merge   | avx2   | 0.590499 | 0.038342   | 15.40x  | -93.51%        |
| 3840x2160 | float32 | 4        | 0.50    | divide        | scalar | 0.567884 | 0.114575   | 4.96x   | -79.82%        |
| 3840x2160 | float32 | 4        | 0.50    | divide        | sse42  | 0.567884 | 0.038477   | 14.76x  | -93.22%        |
| 3840x2160 | float32 | 4        | 0.50    | divide        | avx2   | 0.567884 | 0.039534   | 14.36x  | -93.04%        |
| 3840x2160 | float32 | 4        | 0.50    | overlay       | scalar | 0.850264 | 0.241843   | 3.52x   | -71.56%        |
| 3840x2160 | float32 | 4        | 0.50    | overlay       | sse42  | 0.850264 | 0.041587   | 20.45x  | -95.11%        |
| 3840x2160 | float32 | 4        | 0.50    | overlay       | avx2   | 0.850264 | 0.039730   | 21.40x  | -95.33%        |
</details>
<!-- PERF_RESULTS_END -->
