Metadata-Version: 2.4
Name: simd-blend-modes
Version: 1.0.2
Summary: SIMD-accelerated blend modes
Author: Samuel Howard
License: MIT
Project-URL: Homepage, https://github.com/samhaswon/simd_blend_modes
Project-URL: Bug Tracker, https://github.com/samhaswon/simd_blend_modes/issues
Keywords: image,processing
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: C
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Dynamic: license-file

# SIMD Blend Modes

This project reimplements the blend modes from [`blend_modes`](https://github.com/flrs/blend_modes) with C kernels and SIMD
(SSE4.2/AVX2) acceleration. It supports uint8 and float32 NumPy inputs in the range 0..255
and returns output dtype/channel count matching the background image. Missing alpha channels
are treated as fully opaque (255). Opacity defaults to 1.0.

This is mostly intended to be a mostly drop-in replacement, but with a more permissive 
API that allows you to go faster if you don't need FP32 arrays or the information of an
Alpha channel for some layers.

## Build and Install

### General

```bash
pip install simd-blend-modes
```

### Development

```bash
pip install -r requirements-dev.txt
pip install -e .
```

## Usage

```python
import numpy as np
import simd_blend_modes as sbm

background = np.zeros((512, 512, 4), dtype=np.uint8)
foreground = np.zeros((512, 512, 4), dtype=np.uint8)

out = sbm.screen(background, foreground, 0.5)
```

Inputs:

- Dtypes: `np.uint8` or `np.float32` only.
- Value range: 0..255 for both dtypes.
  - This expects float32 inputs to be cast from uint8, not normalized as well.
- Shapes: `H x W x C` with `C` = 3 (RGB) or 4 (RGBA).
- Output: dtype and channel count match the background image.
- Alpha: if a source is RGB (3 channels), alpha is treated as 255 (fully opaque).
- Opacity: the third argument is optional; defaults to `1.0`.

Supported blend modes:

- [`normal`](https://en.wikipedia.org/wiki/Blend_modes#Normal_blend_mode)
- [`soft_light`](https://en.wikipedia.org/wiki/Blend_modes#Soft_Light)
- [`lighten_only`](https://en.wikipedia.org/wiki/Blend_modes#Lighten_Only)
- [`screen`](https://en.wikipedia.org/wiki/Blend_modes#Screen)
- [`dodge`](https://en.wikipedia.org/wiki/Blend_modes#Dodge_and_burn)
- [`addition`](https://en.wikipedia.org/wiki/Blend_modes#Addition)
- [`darken_only`](https://en.wikipedia.org/wiki/Blend_modes#Darken_Only)
- [`multiply`](https://en.wikipedia.org/wiki/Blend_modes#Multiply)
- [`hard_light`](https://en.wikipedia.org/wiki/Blend_modes#Hard_Light)
- [`difference`](https://en.wikipedia.org/wiki/Blend_modes#Difference)
- [`subtract`](https://en.wikipedia.org/wiki/Blend_modes#Subtract)
- `grain_extract` (known from GIMP)
- `grain_merge` (known from GIMP)
- [`divide`](https://en.wikipedia.org/wiki/Blend_modes#Divide)
- [`overlay`](https://en.wikipedia.org/wiki/Blend_modes#Overlay)

You can force a kernel by passing a string (or `KernelKind` value):

```python
out = sbm.screen(background, foreground, 0.5, "avx2")
```

## Tests

Correctness and performance:

```bash
python3 -m unittest discover tests/
```

Performance:

```bash
python3 -m unittest tests.test_performance
```

The performance test prints a markdown table of per-kernel speedups vs the NumPy reference
for common square sizes and screen resolutions.

## ARM

ARM isn't properly supported as I do not have a new enough ARM CPU to test on. 
Nor do I wish to use a cloud VM to test it. So, if you want ARM support, open a PR.
It should build and be faster, but there's no SIMD support there (yet).

ARM builds run in scalar-only mode (x86 SIMD is compile-time gated). To test ARM under Docker,
enable emulation and then build with the ARM platform. 

If you don't already have buildx/binfmt configured, run:

```bash
docker run --privileged --rm tonistiigi/binfmt --install arm64
```

Then build or run the ARM container:

```bash
docker compose up --build
```

This is incredibly slow. I wouldn't actually do this, but it's here.

## Notes

- SIMD kernels are selected at runtime: AVX2 → SSE4.2 → scalar.
- ARM builds are supported in scalar-only mode; x86 SIMD is compile-time gated. CI does not emit
  ARM artifacts.
- Reference tests adapted from the original project live in `tests/reference_blend_modes_tests.py`
  and are skipped unless the `blend_modes` package and test assets are available.
- The SIMD paths currently assume contiguous arrays (the input validation enforces this).

## Performance 

<!--
The performance test prints large tables. If your terminal buffer is limited, you can write the
output into this README instead by setting `WRITE_RESULTS_TO_README = True` in
`tests/test_performance.py`. When enabled, it replaces the block between the markers below.
-->

<!-- PERF_RESULTS_START -->
| Mode          | Kernel | Ref (s)  | Kernel (s) | Speedup | Percent Change |
| ------------- | ------ | -------- | ---------- | ------- | -------------- |
| normal        | scalar | 0.152080 | 0.032742   | 4.64x   | -78.47%        |
| normal        | sse42  | 0.152080 | 0.010798   | 14.08x  | -92.90%        |
| normal        | avx2   | 0.152080 | 0.010636   | 14.30x  | -93.01%        |
| soft_light    | scalar | 0.209721 | 0.038664   | 5.42x   | -81.56%        |
| soft_light    | sse42  | 0.209721 | 0.013059   | 16.06x  | -93.77%        |
| soft_light    | avx2   | 0.209721 | 0.011835   | 17.72x  | -94.36%        |
| lighten_only  | scalar | 0.153868 | 0.041726   | 3.69x   | -72.88%        |
| lighten_only  | sse42  | 0.153868 | 0.011720   | 13.13x  | -92.38%        |
| lighten_only  | avx2   | 0.153868 | 0.011296   | 13.62x  | -92.66%        |
| screen        | scalar | 0.162643 | 0.036807   | 4.42x   | -77.37%        |
| screen        | sse42  | 0.162643 | 0.012259   | 13.27x  | -92.46%        |
| screen        | avx2   | 0.162643 | 0.011528   | 14.11x  | -92.91%        |
| dodge         | scalar | 0.163841 | 0.039055   | 4.20x   | -76.16%        |
| dodge         | sse42  | 0.163841 | 0.013628   | 12.02x  | -91.68%        |
| dodge         | avx2   | 0.163841 | 0.011869   | 13.80x  | -92.76%        |
| addition      | scalar | 0.157343 | 0.059510   | 2.64x   | -62.18%        |
| addition      | sse42  | 0.157343 | 0.012699   | 12.39x  | -91.93%        |
| addition      | avx2   | 0.157343 | 0.011721   | 13.42x  | -92.55%        |
| darken_only   | scalar | 0.153869 | 0.041986   | 3.66x   | -72.71%        |
| darken_only   | sse42  | 0.153869 | 0.011764   | 13.08x  | -92.35%        |
| darken_only   | avx2   | 0.153869 | 0.011305   | 13.61x  | -92.65%        |
| multiply      | scalar | 0.157435 | 0.036593   | 4.30x   | -76.76%        |
| multiply      | sse42  | 0.157435 | 0.011845   | 13.29x  | -92.48%        |
| multiply      | avx2   | 0.157435 | 0.011343   | 13.88x  | -92.80%        |
| hard_light    | scalar | 0.231979 | 0.073631   | 3.15x   | -68.26%        |
| hard_light    | sse42  | 0.231979 | 0.013737   | 16.89x  | -94.08%        |
| hard_light    | avx2   | 0.231979 | 0.011871   | 19.54x  | -94.88%        |
| difference    | scalar | 0.213577 | 0.036500   | 5.85x   | -82.91%        |
| difference    | sse42  | 0.213577 | 0.011911   | 17.93x  | -94.42%        |
| difference    | avx2   | 0.213577 | 0.011371   | 18.78x  | -94.68%        |
| subtract      | scalar | 0.156726 | 0.037817   | 4.14x   | -75.87%        |
| subtract      | sse42  | 0.156726 | 0.013245   | 11.83x  | -91.55%        |
| subtract      | avx2   | 0.156726 | 0.011774   | 13.31x  | -92.49%        |
| grain_extract | scalar | 0.161499 | 0.048936   | 3.30x   | -69.70%        |
| grain_extract | sse42  | 0.161499 | 0.012698   | 12.72x  | -92.14%        |
| grain_extract | avx2   | 0.161499 | 0.011656   | 13.86x  | -92.78%        |
| grain_merge   | scalar | 0.161065 | 0.048878   | 3.30x   | -69.65%        |
| grain_merge   | sse42  | 0.161065 | 0.012660   | 12.72x  | -92.14%        |
| grain_merge   | avx2   | 0.161065 | 0.011710   | 13.75x  | -92.73%        |
| divide        | scalar | 0.164504 | 0.037938   | 4.34x   | -76.94%        |
| divide        | sse42  | 0.164504 | 0.013081   | 12.58x  | -92.05%        |
| divide        | avx2   | 0.164504 | 0.011762   | 13.99x  | -92.85%        |
| overlay       | scalar | 0.215788 | 0.070267   | 3.07x   | -67.44%        |
| overlay       | sse42  | 0.215788 | 0.013140   | 16.42x  | -93.91%        |
| overlay       | avx2   | 0.215788 | 0.011764   | 18.34x  | -94.55%        |

<details>
<summary>Per-kernel, size, and type results</summary>

| Case      | Input   | Channels | Opacity | Mode          | Kernel | Ref (s)  | Kernel (s) | Speedup | Percent Change |
| --------- | ------- | -------- | ------- | ------------- | ------ | -------- | ---------- | ------- | -------------- |
| 256x256   | uint8   | 3        | 0.50    | normal        | scalar | 0.006370 | 0.001590   | 4.01x   | -75.03%        |
| 256x256   | uint8   | 3        | 0.50    | normal        | sse42  | 0.006370 | 0.000699   | 9.11x   | -89.02%        |
| 256x256   | uint8   | 3        | 0.50    | normal        | avx2   | 0.006370 | 0.000705   | 9.03x   | -88.93%        |
| 256x256   | uint8   | 3        | 0.50    | soft_light    | scalar | 0.008471 | 0.001787   | 4.74x   | -78.90%        |
| 256x256   | uint8   | 3        | 0.50    | soft_light    | sse42  | 0.008471 | 0.000875   | 9.69x   | -89.68%        |
| 256x256   | uint8   | 3        | 0.50    | soft_light    | avx2   | 0.008471 | 0.000801   | 10.58x  | -90.54%        |
| 256x256   | uint8   | 3        | 0.50    | lighten_only  | scalar | 0.007139 | 0.001952   | 3.66x   | -72.66%        |
| 256x256   | uint8   | 3        | 0.50    | lighten_only  | sse42  | 0.007139 | 0.000798   | 8.94x   | -88.82%        |
| 256x256   | uint8   | 3        | 0.50    | lighten_only  | avx2   | 0.007139 | 0.000770   | 9.27x   | -89.21%        |
| 256x256   | uint8   | 3        | 0.50    | screen        | scalar | 0.007168 | 0.001715   | 4.18x   | -76.07%        |
| 256x256   | uint8   | 3        | 0.50    | screen        | sse42  | 0.007168 | 0.000794   | 9.03x   | -88.92%        |
| 256x256   | uint8   | 3        | 0.50    | screen        | avx2   | 0.007168 | 0.000765   | 9.37x   | -89.33%        |
| 256x256   | uint8   | 3        | 0.50    | dodge         | scalar | 0.007354 | 0.001816   | 4.05x   | -75.30%        |
| 256x256   | uint8   | 3        | 0.50    | dodge         | sse42  | 0.007354 | 0.000878   | 8.37x   | -88.06%        |
| 256x256   | uint8   | 3        | 0.50    | dodge         | avx2   | 0.007354 | 0.000796   | 9.24x   | -89.18%        |
| 256x256   | uint8   | 3        | 0.50    | addition      | scalar | 0.007341 | 0.002490   | 2.95x   | -66.09%        |
| 256x256   | uint8   | 3        | 0.50    | addition      | sse42  | 0.007341 | 0.000791   | 9.28x   | -89.22%        |
| 256x256   | uint8   | 3        | 0.50    | addition      | avx2   | 0.007341 | 0.000747   | 9.82x   | -89.82%        |
| 256x256   | uint8   | 3        | 0.50    | darken_only   | scalar | 0.007225 | 0.001920   | 3.76x   | -73.43%        |
| 256x256   | uint8   | 3        | 0.50    | darken_only   | sse42  | 0.007225 | 0.000796   | 9.08x   | -88.98%        |
| 256x256   | uint8   | 3        | 0.50    | darken_only   | avx2   | 0.007225 | 0.000775   | 9.32x   | -89.28%        |
| 256x256   | uint8   | 3        | 0.50    | multiply      | scalar | 0.006988 | 0.001737   | 4.02x   | -75.15%        |
| 256x256   | uint8   | 3        | 0.50    | multiply      | sse42  | 0.006988 | 0.000808   | 8.65x   | -88.44%        |
| 256x256   | uint8   | 3        | 0.50    | multiply      | avx2   | 0.006988 | 0.000782   | 8.94x   | -88.81%        |
| 256x256   | uint8   | 3        | 0.50    | hard_light    | scalar | 0.008761 | 0.002959   | 2.96x   | -66.23%        |
| 256x256   | uint8   | 3        | 0.50    | hard_light    | sse42  | 0.008761 | 0.000934   | 9.38x   | -89.34%        |
| 256x256   | uint8   | 3        | 0.50    | hard_light    | avx2   | 0.008761 | 0.000796   | 11.01x  | -90.92%        |
| 256x256   | uint8   | 3        | 0.50    | difference    | scalar | 0.008801 | 0.001744   | 5.05x   | -80.18%        |
| 256x256   | uint8   | 3        | 0.50    | difference    | sse42  | 0.008801 | 0.000801   | 10.99x  | -90.90%        |
| 256x256   | uint8   | 3        | 0.50    | difference    | avx2   | 0.008801 | 0.000774   | 11.37x  | -91.20%        |
| 256x256   | uint8   | 3        | 0.50    | subtract      | scalar | 0.007266 | 0.001603   | 4.53x   | -77.94%        |
| 256x256   | uint8   | 3        | 0.50    | subtract      | sse42  | 0.007266 | 0.000875   | 8.30x   | -87.95%        |
| 256x256   | uint8   | 3        | 0.50    | subtract      | avx2   | 0.007266 | 0.000815   | 8.92x   | -88.78%        |
| 256x256   | uint8   | 3        | 0.50    | grain_extract | scalar | 0.007047 | 0.002080   | 3.39x   | -70.48%        |
| 256x256   | uint8   | 3        | 0.50    | grain_extract | sse42  | 0.007047 | 0.000840   | 8.39x   | -88.07%        |
| 256x256   | uint8   | 3        | 0.50    | grain_extract | avx2   | 0.007047 | 0.000781   | 9.02x   | -88.91%        |
| 256x256   | uint8   | 3        | 0.50    | grain_merge   | scalar | 0.007489 | 0.002105   | 3.56x   | -71.89%        |
| 256x256   | uint8   | 3        | 0.50    | grain_merge   | sse42  | 0.007489 | 0.000869   | 8.62x   | -88.40%        |
| 256x256   | uint8   | 3        | 0.50    | grain_merge   | avx2   | 0.007489 | 0.000859   | 8.72x   | -88.53%        |
| 256x256   | uint8   | 3        | 0.50    | divide        | scalar | 0.007437 | 0.001764   | 4.22x   | -76.29%        |
| 256x256   | uint8   | 3        | 0.50    | divide        | sse42  | 0.007437 | 0.000866   | 8.58x   | -88.35%        |
| 256x256   | uint8   | 3        | 0.50    | divide        | avx2   | 0.007437 | 0.000810   | 9.19x   | -89.11%        |
| 256x256   | uint8   | 3        | 0.50    | overlay       | scalar | 0.008993 | 0.002876   | 3.13x   | -68.02%        |
| 256x256   | uint8   | 3        | 0.50    | overlay       | sse42  | 0.008993 | 0.000865   | 10.40x  | -90.38%        |
| 256x256   | uint8   | 3        | 0.50    | overlay       | avx2   | 0.008993 | 0.000806   | 11.16x  | -91.04%        |
| 256x256   | uint8   | 4        | 0.50    | normal        | scalar | 0.003095 | 0.001316   | 2.35x   | -57.46%        |
| 256x256   | uint8   | 4        | 0.50    | normal        | sse42  | 0.003095 | 0.000178   | 17.40x  | -94.25%        |
| 256x256   | uint8   | 4        | 0.50    | normal        | avx2   | 0.003095 | 0.000162   | 19.15x  | -94.78%        |
| 256x256   | uint8   | 4        | 0.50    | soft_light    | scalar | 0.006785 | 0.001655   | 4.10x   | -75.61%        |
| 256x256   | uint8   | 4        | 0.50    | soft_light    | sse42  | 0.006785 | 0.000222   | 30.59x  | -96.73%        |
| 256x256   | uint8   | 4        | 0.50    | soft_light    | avx2   | 0.006785 | 0.000197   | 34.46x  | -97.10%        |
| 256x256   | uint8   | 4        | 0.50    | lighten_only  | scalar | 0.005477 | 0.001777   | 3.08x   | -67.55%        |
| 256x256   | uint8   | 4        | 0.50    | lighten_only  | sse42  | 0.005477 | 0.000188   | 29.08x  | -96.56%        |
| 256x256   | uint8   | 4        | 0.50    | lighten_only  | avx2   | 0.005477 | 0.000194   | 28.28x  | -96.46%        |
| 256x256   | uint8   | 4        | 0.50    | screen        | scalar | 0.005569 | 0.001538   | 3.62x   | -72.37%        |
| 256x256   | uint8   | 4        | 0.50    | screen        | sse42  | 0.005569 | 0.000208   | 26.73x  | -96.26%        |
| 256x256   | uint8   | 4        | 0.50    | screen        | avx2   | 0.005569 | 0.000194   | 28.64x  | -96.51%        |
| 256x256   | uint8   | 4        | 0.50    | dodge         | scalar | 0.005804 | 0.001616   | 3.59x   | -72.16%        |
| 256x256   | uint8   | 4        | 0.50    | dodge         | sse42  | 0.005804 | 0.000233   | 24.87x  | -95.98%        |
| 256x256   | uint8   | 4        | 0.50    | dodge         | avx2   | 0.005804 | 0.000194   | 29.93x  | -96.66%        |
| 256x256   | uint8   | 4        | 0.50    | addition      | scalar | 0.005651 | 0.001928   | 2.93x   | -65.89%        |
| 256x256   | uint8   | 4        | 0.50    | addition      | sse42  | 0.005651 | 0.000260   | 21.75x  | -95.40%        |
| 256x256   | uint8   | 4        | 0.50    | addition      | avx2   | 0.005651 | 0.000200   | 28.26x  | -96.46%        |
| 256x256   | uint8   | 4        | 0.50    | darken_only   | scalar | 0.005410 | 0.001722   | 3.14x   | -68.18%        |
| 256x256   | uint8   | 4        | 0.50    | darken_only   | sse42  | 0.005410 | 0.000193   | 28.09x  | -96.44%        |
| 256x256   | uint8   | 4        | 0.50    | darken_only   | avx2   | 0.005410 | 0.000189   | 28.55x  | -96.50%        |
| 256x256   | uint8   | 4        | 0.50    | multiply      | scalar | 0.005434 | 0.001546   | 3.51x   | -71.55%        |
| 256x256   | uint8   | 4        | 0.50    | multiply      | sse42  | 0.005434 | 0.000196   | 27.79x  | -96.40%        |
| 256x256   | uint8   | 4        | 0.50    | multiply      | avx2   | 0.005434 | 0.000186   | 29.16x  | -96.57%        |
| 256x256   | uint8   | 4        | 0.50    | hard_light    | scalar | 0.007276 | 0.002551   | 2.85x   | -64.94%        |
| 256x256   | uint8   | 4        | 0.50    | hard_light    | sse42  | 0.007276 | 0.000236   | 30.83x  | -96.76%        |
| 256x256   | uint8   | 4        | 0.50    | hard_light    | avx2   | 0.007276 | 0.000196   | 37.19x  | -97.31%        |
| 256x256   | uint8   | 4        | 0.50    | difference    | scalar | 0.007216 | 0.001544   | 4.67x   | -78.61%        |
| 256x256   | uint8   | 4        | 0.50    | difference    | sse42  | 0.007216 | 0.000193   | 37.42x  | -97.33%        |
| 256x256   | uint8   | 4        | 0.50    | difference    | avx2   | 0.007216 | 0.000192   | 37.66x  | -97.34%        |
| 256x256   | uint8   | 4        | 0.50    | subtract      | scalar | 0.005564 | 0.001470   | 3.79x   | -73.59%        |
| 256x256   | uint8   | 4        | 0.50    | subtract      | sse42  | 0.005564 | 0.000269   | 20.70x  | -95.17%        |
| 256x256   | uint8   | 4        | 0.50    | subtract      | avx2   | 0.005564 | 0.000198   | 28.15x  | -96.45%        |
| 256x256   | uint8   | 4        | 0.50    | grain_extract | scalar | 0.005604 | 0.001861   | 3.01x   | -66.79%        |
| 256x256   | uint8   | 4        | 0.50    | grain_extract | sse42  | 0.005604 | 0.000217   | 25.78x  | -96.12%        |
| 256x256   | uint8   | 4        | 0.50    | grain_extract | avx2   | 0.005604 | 0.000195   | 28.75x  | -96.52%        |
| 256x256   | uint8   | 4        | 0.50    | grain_merge   | scalar | 0.005473 | 0.001873   | 2.92x   | -65.78%        |
| 256x256   | uint8   | 4        | 0.50    | grain_merge   | sse42  | 0.005473 | 0.000213   | 25.68x  | -96.11%        |
| 256x256   | uint8   | 4        | 0.50    | grain_merge   | avx2   | 0.005473 | 0.000192   | 28.52x  | -96.49%        |
| 256x256   | uint8   | 4        | 0.50    | divide        | scalar | 0.005750 | 0.001582   | 3.63x   | -72.48%        |
| 256x256   | uint8   | 4        | 0.50    | divide        | sse42  | 0.005750 | 0.000216   | 26.59x  | -96.24%        |
| 256x256   | uint8   | 4        | 0.50    | divide        | avx2   | 0.005750 | 0.000191   | 30.11x  | -96.68%        |
| 256x256   | uint8   | 4        | 0.50    | overlay       | scalar | 0.006926 | 0.002479   | 2.79x   | -64.21%        |
| 256x256   | uint8   | 4        | 0.50    | overlay       | sse42  | 0.006926 | 0.000243   | 28.54x  | -96.50%        |
| 256x256   | uint8   | 4        | 0.50    | overlay       | avx2   | 0.006926 | 0.000194   | 35.76x  | -97.20%        |
| 256x256   | float32 | 3        | 0.50    | normal        | scalar | 0.005580 | 0.000497   | 11.22x  | -91.09%        |
| 256x256   | float32 | 3        | 0.50    | normal        | sse42  | 0.005580 | 0.000219   | 25.49x  | -96.08%        |
| 256x256   | float32 | 3        | 0.50    | normal        | avx2   | 0.005580 | 0.000141   | 39.54x  | -97.47%        |
| 256x256   | float32 | 3        | 0.50    | soft_light    | scalar | 0.008864 | 0.000618   | 14.36x  | -93.03%        |
| 256x256   | float32 | 3        | 0.50    | soft_light    | sse42  | 0.008864 | 0.000258   | 34.33x  | -97.09%        |
| 256x256   | float32 | 3        | 0.50    | soft_light    | avx2   | 0.008864 | 0.000194   | 45.67x  | -97.81%        |
| 256x256   | float32 | 3        | 0.50    | lighten_only  | scalar | 0.007150 | 0.000737   | 9.70x   | -89.69%        |
| 256x256   | float32 | 3        | 0.50    | lighten_only  | sse42  | 0.007150 | 0.000217   | 32.98x  | -96.97%        |
| 256x256   | float32 | 3        | 0.50    | lighten_only  | avx2   | 0.007150 | 0.000178   | 40.21x  | -97.51%        |
| 256x256   | float32 | 3        | 0.50    | screen        | scalar | 0.007325 | 0.000551   | 13.28x  | -92.47%        |
| 256x256   | float32 | 3        | 0.50    | screen        | sse42  | 0.007325 | 0.000240   | 30.48x  | -96.72%        |
| 256x256   | float32 | 3        | 0.50    | screen        | avx2   | 0.007325 | 0.000184   | 39.83x  | -97.49%        |
| 256x256   | float32 | 3        | 0.50    | dodge         | scalar | 0.007305 | 0.000631   | 11.59x  | -91.37%        |
| 256x256   | float32 | 3        | 0.50    | dodge         | sse42  | 0.007305 | 0.000275   | 26.53x  | -96.23%        |
| 256x256   | float32 | 3        | 0.50    | dodge         | avx2   | 0.007305 | 0.000211   | 34.63x  | -97.11%        |
| 256x256   | float32 | 3        | 0.50    | addition      | scalar | 0.007376 | 0.001557   | 4.74x   | -78.89%        |
| 256x256   | float32 | 3        | 0.50    | addition      | sse42  | 0.007376 | 0.000233   | 31.68x  | -96.84%        |
| 256x256   | float32 | 3        | 0.50    | addition      | avx2   | 0.007376 | 0.000184   | 40.09x  | -97.51%        |
| 256x256   | float32 | 3        | 0.50    | darken_only   | scalar | 0.007243 | 0.000734   | 9.86x   | -89.86%        |
| 256x256   | float32 | 3        | 0.50    | darken_only   | sse42  | 0.007243 | 0.000215   | 33.69x  | -97.03%        |
| 256x256   | float32 | 3        | 0.50    | darken_only   | avx2   | 0.007243 | 0.000177   | 40.90x  | -97.56%        |
| 256x256   | float32 | 3        | 0.50    | multiply      | scalar | 0.007388 | 0.000540   | 13.68x  | -92.69%        |
| 256x256   | float32 | 3        | 0.50    | multiply      | sse42  | 0.007388 | 0.000220   | 33.53x  | -97.02%        |
| 256x256   | float32 | 3        | 0.50    | multiply      | avx2   | 0.007388 | 0.000179   | 41.38x  | -97.58%        |
| 256x256   | float32 | 3        | 0.50    | hard_light    | scalar | 0.009079 | 0.001760   | 5.16x   | -80.62%        |
| 256x256   | float32 | 3        | 0.50    | hard_light    | sse42  | 0.009079 | 0.000279   | 32.53x  | -96.93%        |
| 256x256   | float32 | 3        | 0.50    | hard_light    | avx2   | 0.009079 | 0.000185   | 49.19x  | -97.97%        |
| 256x256   | float32 | 3        | 0.50    | difference    | scalar | 0.009052 | 0.000547   | 16.53x  | -93.95%        |
| 256x256   | float32 | 3        | 0.50    | difference    | sse42  | 0.009052 | 0.000227   | 39.82x  | -97.49%        |
| 256x256   | float32 | 3        | 0.50    | difference    | avx2   | 0.009052 | 0.000183   | 49.41x  | -97.98%        |
| 256x256   | float32 | 3        | 0.50    | subtract      | scalar | 0.007449 | 0.000692   | 10.77x  | -90.71%        |
| 256x256   | float32 | 3        | 0.50    | subtract      | sse42  | 0.007449 | 0.000241   | 30.94x  | -96.77%        |
| 256x256   | float32 | 3        | 0.50    | subtract      | avx2   | 0.007449 | 0.000185   | 40.31x  | -97.52%        |
| 256x256   | float32 | 3        | 0.50    | grain_extract | scalar | 0.007552 | 0.001002   | 7.54x   | -86.73%        |
| 256x256   | float32 | 3        | 0.50    | grain_extract | sse42  | 0.007552 | 0.000232   | 32.52x  | -96.92%        |
| 256x256   | float32 | 3        | 0.50    | grain_extract | avx2   | 0.007552 | 0.000179   | 42.08x  | -97.62%        |
| 256x256   | float32 | 3        | 0.50    | grain_merge   | scalar | 0.007136 | 0.001000   | 7.14x   | -85.99%        |
| 256x256   | float32 | 3        | 0.50    | grain_merge   | sse42  | 0.007136 | 0.000266   | 26.79x  | -96.27%        |
| 256x256   | float32 | 3        | 0.50    | grain_merge   | avx2   | 0.007136 | 0.000182   | 39.16x  | -97.45%        |
| 256x256   | float32 | 3        | 0.50    | divide        | scalar | 0.007395 | 0.000607   | 12.18x  | -91.79%        |
| 256x256   | float32 | 3        | 0.50    | divide        | sse42  | 0.007395 | 0.000263   | 28.17x  | -96.45%        |
| 256x256   | float32 | 3        | 0.50    | divide        | avx2   | 0.007395 | 0.000181   | 40.89x  | -97.55%        |
| 256x256   | float32 | 3        | 0.50    | overlay       | scalar | 0.008732 | 0.001631   | 5.35x   | -81.32%        |
| 256x256   | float32 | 3        | 0.50    | overlay       | sse42  | 0.008732 | 0.000254   | 34.38x  | -97.09%        |
| 256x256   | float32 | 3        | 0.50    | overlay       | avx2   | 0.008732 | 0.000183   | 47.81x  | -97.91%        |
| 256x256   | float32 | 4        | 0.50    | normal        | scalar | 0.004371 | 0.000616   | 7.10x   | -85.91%        |
| 256x256   | float32 | 4        | 0.50    | normal        | sse42  | 0.004371 | 0.000143   | 30.52x  | -96.72%        |
| 256x256   | float32 | 4        | 0.50    | normal        | avx2   | 0.004371 | 0.000153   | 28.49x  | -96.49%        |
| 256x256   | float32 | 4        | 0.50    | soft_light    | scalar | 0.006858 | 0.000720   | 9.52x   | -89.50%        |
| 256x256   | float32 | 4        | 0.50    | soft_light    | sse42  | 0.006858 | 0.000182   | 37.74x  | -97.35%        |
| 256x256   | float32 | 4        | 0.50    | soft_light    | avx2   | 0.006858 | 0.000182   | 37.69x  | -97.35%        |
| 256x256   | float32 | 4        | 0.50    | lighten_only  | scalar | 0.005419 | 0.000769   | 7.05x   | -85.82%        |
| 256x256   | float32 | 4        | 0.50    | lighten_only  | sse42  | 0.005419 | 0.000165   | 32.90x  | -96.96%        |
| 256x256   | float32 | 4        | 0.50    | lighten_only  | avx2   | 0.005419 | 0.000181   | 29.90x  | -96.66%        |
| 256x256   | float32 | 4        | 0.50    | screen        | scalar | 0.005604 | 0.000672   | 8.34x   | -88.01%        |
| 256x256   | float32 | 4        | 0.50    | screen        | sse42  | 0.005604 | 0.000174   | 32.25x  | -96.90%        |
| 256x256   | float32 | 4        | 0.50    | screen        | avx2   | 0.005604 | 0.000179   | 31.30x  | -96.80%        |
| 256x256   | float32 | 4        | 0.50    | dodge         | scalar | 0.005572 | 0.000751   | 7.42x   | -86.52%        |
| 256x256   | float32 | 4        | 0.50    | dodge         | sse42  | 0.005572 | 0.000216   | 25.76x  | -96.12%        |
| 256x256   | float32 | 4        | 0.50    | dodge         | avx2   | 0.005572 | 0.000189   | 29.55x  | -96.62%        |
| 256x256   | float32 | 4        | 0.50    | addition      | scalar | 0.005738 | 0.001335   | 4.30x   | -76.73%        |
| 256x256   | float32 | 4        | 0.50    | addition      | sse42  | 0.005738 | 0.000186   | 30.80x  | -96.75%        |
| 256x256   | float32 | 4        | 0.50    | addition      | avx2   | 0.005738 | 0.000185   | 31.02x  | -96.78%        |
| 256x256   | float32 | 4        | 0.50    | darken_only   | scalar | 0.005302 | 0.000759   | 6.99x   | -85.69%        |
| 256x256   | float32 | 4        | 0.50    | darken_only   | sse42  | 0.005302 | 0.000159   | 33.44x  | -97.01%        |
| 256x256   | float32 | 4        | 0.50    | darken_only   | avx2   | 0.005302 | 0.000176   | 30.16x  | -96.68%        |
| 256x256   | float32 | 4        | 0.50    | multiply      | scalar | 0.005762 | 0.000639   | 9.02x   | -88.91%        |
| 256x256   | float32 | 4        | 0.50    | multiply      | sse42  | 0.005762 | 0.000167   | 34.59x  | -97.11%        |
| 256x256   | float32 | 4        | 0.50    | multiply      | avx2   | 0.005762 | 0.000179   | 32.26x  | -96.90%        |
| 256x256   | float32 | 4        | 0.50    | hard_light    | scalar | 0.007241 | 0.001865   | 3.88x   | -74.24%        |
| 256x256   | float32 | 4        | 0.50    | hard_light    | sse42  | 0.007241 | 0.000222   | 32.58x  | -96.93%        |
| 256x256   | float32 | 4        | 0.50    | hard_light    | avx2   | 0.007241 | 0.000183   | 39.61x  | -97.48%        |
| 256x256   | float32 | 4        | 0.50    | difference    | scalar | 0.007348 | 0.000647   | 11.36x  | -91.20%        |
| 256x256   | float32 | 4        | 0.50    | difference    | sse42  | 0.007348 | 0.000173   | 42.36x  | -97.64%        |
| 256x256   | float32 | 4        | 0.50    | difference    | avx2   | 0.007348 | 0.000183   | 40.04x  | -97.50%        |
| 256x256   | float32 | 4        | 0.50    | subtract      | scalar | 0.005691 | 0.000862   | 6.60x   | -84.85%        |
| 256x256   | float32 | 4        | 0.50    | subtract      | sse42  | 0.005691 | 0.000191   | 29.83x  | -96.65%        |
| 256x256   | float32 | 4        | 0.50    | subtract      | avx2   | 0.005691 | 0.000184   | 31.01x  | -96.77%        |
| 256x256   | float32 | 4        | 0.50    | grain_extract | scalar | 0.005481 | 0.001052   | 5.21x   | -80.81%        |
| 256x256   | float32 | 4        | 0.50    | grain_extract | sse42  | 0.005481 | 0.000170   | 32.19x  | -96.89%        |
| 256x256   | float32 | 4        | 0.50    | grain_extract | avx2   | 0.005481 | 0.000182   | 30.09x  | -96.68%        |
| 256x256   | float32 | 4        | 0.50    | grain_merge   | scalar | 0.005445 | 0.001059   | 5.14x   | -80.56%        |
| 256x256   | float32 | 4        | 0.50    | grain_merge   | sse42  | 0.005445 | 0.000178   | 30.52x  | -96.72%        |
| 256x256   | float32 | 4        | 0.50    | grain_merge   | avx2   | 0.005445 | 0.000181   | 30.04x  | -96.67%        |
| 256x256   | float32 | 4        | 0.50    | divide        | scalar | 0.005570 | 0.000705   | 7.90x   | -87.34%        |
| 256x256   | float32 | 4        | 0.50    | divide        | sse42  | 0.005570 | 0.000179   | 31.09x  | -96.78%        |
| 256x256   | float32 | 4        | 0.50    | divide        | avx2   | 0.005570 | 0.000180   | 30.95x  | -96.77%        |
| 256x256   | float32 | 4        | 0.50    | overlay       | scalar | 0.007038 | 0.001723   | 4.08x   | -75.52%        |
| 256x256   | float32 | 4        | 0.50    | overlay       | sse42  | 0.007038 | 0.000183   | 38.42x  | -97.40%        |
| 256x256   | float32 | 4        | 0.50    | overlay       | avx2   | 0.007038 | 0.000177   | 39.77x  | -97.49%        |
| 512x512   | uint8   | 3        | 0.50    | normal        | scalar | 0.032664 | 0.006359   | 5.14x   | -80.53%        |
| 512x512   | uint8   | 3        | 0.50    | normal        | sse42  | 0.032664 | 0.002753   | 11.86x  | -91.57%        |
| 512x512   | uint8   | 3        | 0.50    | normal        | avx2   | 0.032664 | 0.002755   | 11.86x  | -91.56%        |
| 512x512   | uint8   | 3        | 0.00    | normal        | scalar | 0.032962 | 0.002511   | 13.12x  | -92.38%        |
| 512x512   | uint8   | 3        | 0.00    | normal        | sse42  | 0.032962 | 0.002717   | 12.13x  | -91.76%        |
| 512x512   | uint8   | 3        | 0.00    | normal        | avx2   | 0.032962 | 0.002575   | 12.80x  | -92.19%        |
| 512x512   | uint8   | 3        | 1.00    | normal        | scalar | 0.032701 | 0.002452   | 13.34x  | -92.50%        |
| 512x512   | uint8   | 3        | 1.00    | normal        | sse42  | 0.032701 | 0.002444   | 13.38x  | -92.53%        |
| 512x512   | uint8   | 3        | 1.00    | normal        | avx2   | 0.032701 | 0.002456   | 13.32x  | -92.49%        |
| 512x512   | uint8   | 3        | 0.50    | soft_light    | scalar | 0.043186 | 0.007168   | 6.03x   | -83.40%        |
| 512x512   | uint8   | 3        | 0.50    | soft_light    | sse42  | 0.043186 | 0.003435   | 12.57x  | -92.05%        |
| 512x512   | uint8   | 3        | 0.50    | soft_light    | avx2   | 0.043186 | 0.003369   | 12.82x  | -92.20%        |
| 512x512   | uint8   | 3        | 0.00    | soft_light    | scalar | 0.042857 | 0.002442   | 17.55x  | -94.30%        |
| 512x512   | uint8   | 3        | 0.00    | soft_light    | sse42  | 0.042857 | 0.002439   | 17.57x  | -94.31%        |
| 512x512   | uint8   | 3        | 0.00    | soft_light    | avx2   | 0.042857 | 0.002448   | 17.51x  | -94.29%        |
| 512x512   | uint8   | 3        | 1.00    | soft_light    | scalar | 0.042902 | 0.007092   | 6.05x   | -83.47%        |
| 512x512   | uint8   | 3        | 1.00    | soft_light    | sse42  | 0.042902 | 0.003376   | 12.71x  | -92.13%        |
| 512x512   | uint8   | 3        | 1.00    | soft_light    | avx2   | 0.042902 | 0.003155   | 13.60x  | -92.65%        |
| 512x512   | uint8   | 3        | 0.50    | lighten_only  | scalar | 0.036846 | 0.007739   | 4.76x   | -79.00%        |
| 512x512   | uint8   | 3        | 0.50    | lighten_only  | sse42  | 0.036846 | 0.003063   | 12.03x  | -91.69%        |
| 512x512   | uint8   | 3        | 0.50    | lighten_only  | avx2   | 0.036846 | 0.003028   | 12.17x  | -91.78%        |
| 512x512   | uint8   | 3        | 0.00    | lighten_only  | scalar | 0.036623 | 0.002479   | 14.77x  | -93.23%        |
| 512x512   | uint8   | 3        | 0.00    | lighten_only  | sse42  | 0.036623 | 0.002485   | 14.74x  | -93.21%        |
| 512x512   | uint8   | 3        | 0.00    | lighten_only  | avx2   | 0.036623 | 0.002544   | 14.40x  | -93.05%        |
| 512x512   | uint8   | 3        | 1.00    | lighten_only  | scalar | 0.036539 | 0.007693   | 4.75x   | -78.95%        |
| 512x512   | uint8   | 3        | 1.00    | lighten_only  | sse42  | 0.036539 | 0.003188   | 11.46x  | -91.27%        |
| 512x512   | uint8   | 3        | 1.00    | lighten_only  | avx2   | 0.036539 | 0.002953   | 12.38x  | -91.92%        |
| 512x512   | uint8   | 3        | 0.50    | screen        | scalar | 0.037287 | 0.007076   | 5.27x   | -81.02%        |
| 512x512   | uint8   | 3        | 0.50    | screen        | sse42  | 0.037287 | 0.003285   | 11.35x  | -91.19%        |
| 512x512   | uint8   | 3        | 0.50    | screen        | avx2   | 0.037287 | 0.003154   | 11.82x  | -91.54%        |
| 512x512   | uint8   | 3        | 0.00    | screen        | scalar | 0.038598 | 0.002465   | 15.66x  | -93.61%        |
| 512x512   | uint8   | 3        | 0.00    | screen        | sse42  | 0.038598 | 0.002453   | 15.73x  | -93.64%        |
| 512x512   | uint8   | 3        | 0.00    | screen        | avx2   | 0.038598 | 0.002452   | 15.74x  | -93.65%        |
| 512x512   | uint8   | 3        | 1.00    | screen        | scalar | 0.038291 | 0.006831   | 5.61x   | -82.16%        |
| 512x512   | uint8   | 3        | 1.00    | screen        | sse42  | 0.038291 | 0.003159   | 12.12x  | -91.75%        |
| 512x512   | uint8   | 3        | 1.00    | screen        | avx2   | 0.038291 | 0.003032   | 12.63x  | -92.08%        |
| 512x512   | uint8   | 3        | 0.50    | dodge         | scalar | 0.037752 | 0.007065   | 5.34x   | -81.29%        |
| 512x512   | uint8   | 3        | 0.50    | dodge         | sse42  | 0.037752 | 0.003489   | 10.82x  | -90.76%        |
| 512x512   | uint8   | 3        | 0.50    | dodge         | avx2   | 0.037752 | 0.003187   | 11.85x  | -91.56%        |
| 512x512   | uint8   | 3        | 0.00    | dodge         | scalar | 0.037179 | 0.002459   | 15.12x  | -93.39%        |
| 512x512   | uint8   | 3        | 0.00    | dodge         | sse42  | 0.037179 | 0.002452   | 15.16x  | -93.40%        |
| 512x512   | uint8   | 3        | 0.00    | dodge         | avx2   | 0.037179 | 0.002465   | 15.08x  | -93.37%        |
| 512x512   | uint8   | 3        | 1.00    | dodge         | scalar | 0.037046 | 0.007060   | 5.25x   | -80.94%        |
| 512x512   | uint8   | 3        | 1.00    | dodge         | sse42  | 0.037046 | 0.003464   | 10.69x  | -90.65%        |
| 512x512   | uint8   | 3        | 1.00    | dodge         | avx2   | 0.037046 | 0.003211   | 11.54x  | -91.33%        |
| 512x512   | uint8   | 3        | 0.50    | addition      | scalar | 0.037243 | 0.010142   | 3.67x   | -72.77%        |
| 512x512   | uint8   | 3        | 0.50    | addition      | sse42  | 0.037243 | 0.003262   | 11.42x  | -91.24%        |
| 512x512   | uint8   | 3        | 0.50    | addition      | avx2   | 0.037243 | 0.003050   | 12.21x  | -91.81%        |
| 512x512   | uint8   | 3        | 0.00    | addition      | scalar | 0.036870 | 0.002529   | 14.58x  | -93.14%        |
| 512x512   | uint8   | 3        | 0.00    | addition      | sse42  | 0.036870 | 0.002482   | 14.86x  | -93.27%        |
| 512x512   | uint8   | 3        | 0.00    | addition      | avx2   | 0.036870 | 0.002518   | 14.64x  | -93.17%        |
| 512x512   | uint8   | 3        | 1.00    | addition      | scalar | 0.036626 | 0.013275   | 2.76x   | -63.76%        |
| 512x512   | uint8   | 3        | 1.00    | addition      | sse42  | 0.036626 | 0.003178   | 11.52x  | -91.32%        |
| 512x512   | uint8   | 3        | 1.00    | addition      | avx2   | 0.036626 | 0.003021   | 12.12x  | -91.75%        |
| 512x512   | uint8   | 3        | 0.50    | darken_only   | scalar | 0.036627 | 0.007679   | 4.77x   | -79.04%        |
| 512x512   | uint8   | 3        | 0.50    | darken_only   | sse42  | 0.036627 | 0.003123   | 11.73x  | -91.47%        |
| 512x512   | uint8   | 3        | 0.50    | darken_only   | avx2   | 0.036627 | 0.003037   | 12.06x  | -91.71%        |
| 512x512   | uint8   | 3        | 0.00    | darken_only   | scalar | 0.037025 | 0.002500   | 14.81x  | -93.25%        |
| 512x512   | uint8   | 3        | 0.00    | darken_only   | sse42  | 0.037025 | 0.002501   | 14.81x  | -93.25%        |
| 512x512   | uint8   | 3        | 0.00    | darken_only   | avx2   | 0.037025 | 0.002535   | 14.60x  | -93.15%        |
| 512x512   | uint8   | 3        | 1.00    | darken_only   | scalar | 0.037378 | 0.007924   | 4.72x   | -78.80%        |
| 512x512   | uint8   | 3        | 1.00    | darken_only   | sse42  | 0.037378 | 0.003105   | 12.04x  | -91.69%        |
| 512x512   | uint8   | 3        | 1.00    | darken_only   | avx2   | 0.037378 | 0.003077   | 12.15x  | -91.77%        |
| 512x512   | uint8   | 3        | 0.50    | multiply      | scalar | 0.037533 | 0.006937   | 5.41x   | -81.52%        |
| 512x512   | uint8   | 3        | 0.50    | multiply      | sse42  | 0.037533 | 0.003240   | 11.58x  | -91.37%        |
| 512x512   | uint8   | 3        | 0.50    | multiply      | avx2   | 0.037533 | 0.003026   | 12.40x  | -91.94%        |
| 512x512   | uint8   | 3        | 0.00    | multiply      | scalar | 0.036780 | 0.002619   | 14.04x  | -92.88%        |
| 512x512   | uint8   | 3        | 0.00    | multiply      | sse42  | 0.036780 | 0.002651   | 13.87x  | -92.79%        |
| 512x512   | uint8   | 3        | 0.00    | multiply      | avx2   | 0.036780 | 0.002491   | 14.76x  | -93.23%        |
| 512x512   | uint8   | 3        | 1.00    | multiply      | scalar | 0.036960 | 0.007474   | 4.95x   | -79.78%        |
| 512x512   | uint8   | 3        | 1.00    | multiply      | sse42  | 0.036960 | 0.003158   | 11.70x  | -91.46%        |
| 512x512   | uint8   | 3        | 1.00    | multiply      | avx2   | 0.036960 | 0.003044   | 12.14x  | -91.76%        |
| 512x512   | uint8   | 3        | 0.50    | hard_light    | scalar | 0.046398 | 0.011862   | 3.91x   | -74.44%        |
| 512x512   | uint8   | 3        | 0.50    | hard_light    | sse42  | 0.046398 | 0.003543   | 13.10x  | -92.36%        |
| 512x512   | uint8   | 3        | 0.50    | hard_light    | avx2   | 0.046398 | 0.003261   | 14.23x  | -92.97%        |
| 512x512   | uint8   | 3        | 0.00    | hard_light    | scalar | 0.046417 | 0.002519   | 18.43x  | -94.57%        |
| 512x512   | uint8   | 3        | 0.00    | hard_light    | sse42  | 0.046417 | 0.002527   | 18.37x  | -94.55%        |
| 512x512   | uint8   | 3        | 0.00    | hard_light    | avx2   | 0.046417 | 0.002500   | 18.57x  | -94.62%        |
| 512x512   | uint8   | 3        | 1.00    | hard_light    | scalar | 0.045839 | 0.011953   | 3.84x   | -73.92%        |
| 512x512   | uint8   | 3        | 1.00    | hard_light    | sse42  | 0.045839 | 0.003688   | 12.43x  | -91.96%        |
| 512x512   | uint8   | 3        | 1.00    | hard_light    | avx2   | 0.045839 | 0.003292   | 13.92x  | -92.82%        |
| 512x512   | uint8   | 3        | 0.50    | difference    | scalar | 0.044237 | 0.006964   | 6.35x   | -84.26%        |
| 512x512   | uint8   | 3        | 0.50    | difference    | sse42  | 0.044237 | 0.003116   | 14.20x  | -92.96%        |
| 512x512   | uint8   | 3        | 0.50    | difference    | avx2   | 0.044237 | 0.003000   | 14.74x  | -93.22%        |
| 512x512   | uint8   | 3        | 0.00    | difference    | scalar | 0.044217 | 0.002493   | 17.74x  | -94.36%        |
| 512x512   | uint8   | 3        | 0.00    | difference    | sse42  | 0.044217 | 0.002477   | 17.85x  | -94.40%        |
| 512x512   | uint8   | 3        | 0.00    | difference    | avx2   | 0.044217 | 0.002508   | 17.63x  | -94.33%        |
| 512x512   | uint8   | 3        | 1.00    | difference    | scalar | 0.044061 | 0.007299   | 6.04x   | -83.44%        |
| 512x512   | uint8   | 3        | 1.00    | difference    | sse42  | 0.044061 | 0.003133   | 14.06x  | -92.89%        |
| 512x512   | uint8   | 3        | 1.00    | difference    | avx2   | 0.044061 | 0.003031   | 14.54x  | -93.12%        |
| 512x512   | uint8   | 3        | 0.50    | subtract      | scalar | 0.036994 | 0.006517   | 5.68x   | -82.38%        |
| 512x512   | uint8   | 3        | 0.50    | subtract      | sse42  | 0.036994 | 0.003437   | 10.76x  | -90.71%        |
| 512x512   | uint8   | 3        | 0.50    | subtract      | avx2   | 0.036994 | 0.003179   | 11.64x  | -91.41%        |
| 512x512   | uint8   | 3        | 0.00    | subtract      | scalar | 0.036976 | 0.002518   | 14.68x  | -93.19%        |
| 512x512   | uint8   | 3        | 0.00    | subtract      | sse42  | 0.036976 | 0.002503   | 14.78x  | -93.23%        |
| 512x512   | uint8   | 3        | 0.00    | subtract      | avx2   | 0.036976 | 0.002529   | 14.62x  | -93.16%        |
| 512x512   | uint8   | 3        | 1.00    | subtract      | scalar | 0.037033 | 0.006383   | 5.80x   | -82.76%        |
| 512x512   | uint8   | 3        | 1.00    | subtract      | sse42  | 0.037033 | 0.003389   | 10.93x  | -90.85%        |
| 512x512   | uint8   | 3        | 1.00    | subtract      | avx2   | 0.037033 | 0.003159   | 11.72x  | -91.47%        |
| 512x512   | uint8   | 3        | 0.50    | grain_extract | scalar | 0.037885 | 0.008370   | 4.53x   | -77.91%        |
| 512x512   | uint8   | 3        | 0.50    | grain_extract | sse42  | 0.037885 | 0.003371   | 11.24x  | -91.10%        |
| 512x512   | uint8   | 3        | 0.50    | grain_extract | avx2   | 0.037885 | 0.003178   | 11.92x  | -91.61%        |
| 512x512   | uint8   | 3        | 0.00    | grain_extract | scalar | 0.037451 | 0.002503   | 14.96x  | -93.32%        |
| 512x512   | uint8   | 3        | 0.00    | grain_extract | sse42  | 0.037451 | 0.002484   | 15.08x  | -93.37%        |
| 512x512   | uint8   | 3        | 0.00    | grain_extract | avx2   | 0.037451 | 0.002504   | 14.95x  | -93.31%        |
| 512x512   | uint8   | 3        | 1.00    | grain_extract | scalar | 0.037683 | 0.008524   | 4.42x   | -77.38%        |
| 512x512   | uint8   | 3        | 1.00    | grain_extract | sse42  | 0.037683 | 0.003434   | 10.97x  | -90.89%        |
| 512x512   | uint8   | 3        | 1.00    | grain_extract | avx2   | 0.037683 | 0.003117   | 12.09x  | -91.73%        |
| 512x512   | uint8   | 3        | 0.50    | grain_merge   | scalar | 0.037053 | 0.008445   | 4.39x   | -77.21%        |
| 512x512   | uint8   | 3        | 0.50    | grain_merge   | sse42  | 0.037053 | 0.003344   | 11.08x  | -90.98%        |
| 512x512   | uint8   | 3        | 0.50    | grain_merge   | avx2   | 0.037053 | 0.003106   | 11.93x  | -91.62%        |
| 512x512   | uint8   | 3        | 0.00    | grain_merge   | scalar | 0.036786 | 0.002546   | 14.45x  | -93.08%        |
| 512x512   | uint8   | 3        | 0.00    | grain_merge   | sse42  | 0.036786 | 0.002445   | 15.05x  | -93.35%        |
| 512x512   | uint8   | 3        | 0.00    | grain_merge   | avx2   | 0.036786 | 0.002449   | 15.02x  | -93.34%        |
| 512x512   | uint8   | 3        | 1.00    | grain_merge   | scalar | 0.037150 | 0.008365   | 4.44x   | -77.48%        |
| 512x512   | uint8   | 3        | 1.00    | grain_merge   | sse42  | 0.037150 | 0.003336   | 11.14x  | -91.02%        |
| 512x512   | uint8   | 3        | 1.00    | grain_merge   | avx2   | 0.037150 | 0.003114   | 11.93x  | -91.62%        |
| 512x512   | uint8   | 3        | 0.50    | divide        | scalar | 0.037859 | 0.006928   | 5.46x   | -81.70%        |
| 512x512   | uint8   | 3        | 0.50    | divide        | sse42  | 0.037859 | 0.003395   | 11.15x  | -91.03%        |
| 512x512   | uint8   | 3        | 0.50    | divide        | avx2   | 0.037859 | 0.003177   | 11.92x  | -91.61%        |
| 512x512   | uint8   | 3        | 0.00    | divide        | scalar | 0.037761 | 0.002573   | 14.68x  | -93.19%        |
| 512x512   | uint8   | 3        | 0.00    | divide        | sse42  | 0.037761 | 0.002566   | 14.71x  | -93.20%        |
| 512x512   | uint8   | 3        | 0.00    | divide        | avx2   | 0.037761 | 0.002564   | 14.73x  | -93.21%        |
| 512x512   | uint8   | 3        | 1.00    | divide        | scalar | 0.037679 | 0.006972   | 5.40x   | -81.50%        |
| 512x512   | uint8   | 3        | 1.00    | divide        | sse42  | 0.037679 | 0.003401   | 11.08x  | -90.97%        |
| 512x512   | uint8   | 3        | 1.00    | divide        | avx2   | 0.037679 | 0.003197   | 11.79x  | -91.51%        |
| 512x512   | uint8   | 3        | 0.50    | overlay       | scalar | 0.043801 | 0.011473   | 3.82x   | -73.81%        |
| 512x512   | uint8   | 3        | 0.50    | overlay       | sse42  | 0.043801 | 0.003422   | 12.80x  | -92.19%        |
| 512x512   | uint8   | 3        | 0.50    | overlay       | avx2   | 0.043801 | 0.003172   | 13.81x  | -92.76%        |
| 512x512   | uint8   | 3        | 0.00    | overlay       | scalar | 0.043639 | 0.002450   | 17.81x  | -94.39%        |
| 512x512   | uint8   | 3        | 0.00    | overlay       | sse42  | 0.043639 | 0.002456   | 17.77x  | -94.37%        |
| 512x512   | uint8   | 3        | 0.00    | overlay       | avx2   | 0.043639 | 0.002460   | 17.74x  | -94.36%        |
| 512x512   | uint8   | 3        | 1.00    | overlay       | scalar | 0.043776 | 0.011610   | 3.77x   | -73.48%        |
| 512x512   | uint8   | 3        | 1.00    | overlay       | sse42  | 0.043776 | 0.003438   | 12.73x  | -92.15%        |
| 512x512   | uint8   | 3        | 1.00    | overlay       | avx2   | 0.043776 | 0.003149   | 13.90x  | -92.81%        |
| 512x512   | uint8   | 4        | 0.50    | normal        | scalar | 0.023805 | 0.005147   | 4.62x   | -78.38%        |
| 512x512   | uint8   | 4        | 0.50    | normal        | sse42  | 0.023805 | 0.000693   | 34.37x  | -97.09%        |
| 512x512   | uint8   | 4        | 0.50    | normal        | avx2   | 0.023805 | 0.000625   | 38.07x  | -97.37%        |
| 512x512   | uint8   | 4        | 0.00    | normal        | scalar | 0.024992 | 0.000048   | 515.98x | -99.81%        |
| 512x512   | uint8   | 4        | 0.00    | normal        | sse42  | 0.024992 | 0.000058   | 427.71x | -99.77%        |
| 512x512   | uint8   | 4        | 0.00    | normal        | avx2   | 0.024992 | 0.000048   | 515.82x | -99.81%        |
| 512x512   | uint8   | 4        | 1.00    | normal        | scalar | 0.024000 | 0.005155   | 4.66x   | -78.52%        |
| 512x512   | uint8   | 4        | 1.00    | normal        | sse42  | 0.024000 | 0.000693   | 34.63x  | -97.11%        |
| 512x512   | uint8   | 4        | 1.00    | normal        | avx2   | 0.024000 | 0.000626   | 38.36x  | -97.39%        |
| 512x512   | uint8   | 4        | 0.50    | soft_light    | scalar | 0.034637 | 0.006560   | 5.28x   | -81.06%        |
| 512x512   | uint8   | 4        | 0.50    | soft_light    | sse42  | 0.034637 | 0.000877   | 39.51x  | -97.47%        |
| 512x512   | uint8   | 4        | 0.50    | soft_light    | avx2   | 0.034637 | 0.000780   | 44.42x  | -97.75%        |
| 512x512   | uint8   | 4        | 0.00    | soft_light    | scalar | 0.034354 | 0.000049   | 701.69x | -99.86%        |
| 512x512   | uint8   | 4        | 0.00    | soft_light    | sse42  | 0.034354 | 0.000049   | 697.51x | -99.86%        |
| 512x512   | uint8   | 4        | 0.00    | soft_light    | avx2   | 0.034354 | 0.000050   | 688.44x | -99.85%        |
| 512x512   | uint8   | 4        | 1.00    | soft_light    | scalar | 0.034581 | 0.006454   | 5.36x   | -81.34%        |
| 512x512   | uint8   | 4        | 1.00    | soft_light    | sse42  | 0.034581 | 0.000876   | 39.47x  | -97.47%        |
| 512x512   | uint8   | 4        | 1.00    | soft_light    | avx2   | 0.034581 | 0.000780   | 44.36x  | -97.75%        |
| 512x512   | uint8   | 4        | 0.50    | lighten_only  | scalar | 0.027773 | 0.006963   | 3.99x   | -74.93%        |
| 512x512   | uint8   | 4        | 0.50    | lighten_only  | sse42  | 0.027773 | 0.000757   | 36.71x  | -97.28%        |
| 512x512   | uint8   | 4        | 0.50    | lighten_only  | avx2   | 0.027773 | 0.000754   | 36.85x  | -97.29%        |
| 512x512   | uint8   | 4        | 0.00    | lighten_only  | scalar | 0.027653 | 0.000046   | 603.84x | -99.83%        |
| 512x512   | uint8   | 4        | 0.00    | lighten_only  | sse42  | 0.027653 | 0.000045   | 612.45x | -99.84%        |
| 512x512   | uint8   | 4        | 0.00    | lighten_only  | avx2   | 0.027653 | 0.000046   | 602.23x | -99.83%        |
| 512x512   | uint8   | 4        | 1.00    | lighten_only  | scalar | 0.027743 | 0.006936   | 4.00x   | -75.00%        |
| 512x512   | uint8   | 4        | 1.00    | lighten_only  | sse42  | 0.027743 | 0.000758   | 36.62x  | -97.27%        |
| 512x512   | uint8   | 4        | 1.00    | lighten_only  | avx2   | 0.027743 | 0.000761   | 36.45x  | -97.26%        |
| 512x512   | uint8   | 4        | 0.50    | screen        | scalar | 0.028744 | 0.006211   | 4.63x   | -78.39%        |
| 512x512   | uint8   | 4        | 0.50    | screen        | sse42  | 0.028744 | 0.000824   | 34.87x  | -97.13%        |
| 512x512   | uint8   | 4        | 0.50    | screen        | avx2   | 0.028744 | 0.000785   | 36.62x  | -97.27%        |
| 512x512   | uint8   | 4        | 0.00    | screen        | scalar | 0.028670 | 0.000047   | 611.63x | -99.84%        |
| 512x512   | uint8   | 4        | 0.00    | screen        | sse42  | 0.028670 | 0.000046   | 624.48x | -99.84%        |
| 512x512   | uint8   | 4        | 0.00    | screen        | avx2   | 0.028670 | 0.000046   | 626.63x | -99.84%        |
| 512x512   | uint8   | 4        | 1.00    | screen        | scalar | 0.028771 | 0.006188   | 4.65x   | -78.49%        |
| 512x512   | uint8   | 4        | 1.00    | screen        | sse42  | 0.028771 | 0.000826   | 34.82x  | -97.13%        |
| 512x512   | uint8   | 4        | 1.00    | screen        | avx2   | 0.028771 | 0.000780   | 36.88x  | -97.29%        |
| 512x512   | uint8   | 4        | 0.50    | dodge         | scalar | 0.028966 | 0.006473   | 4.48x   | -77.65%        |
| 512x512   | uint8   | 4        | 0.50    | dodge         | sse42  | 0.028966 | 0.000939   | 30.86x  | -96.76%        |
| 512x512   | uint8   | 4        | 0.50    | dodge         | avx2   | 0.028966 | 0.000767   | 37.75x  | -97.35%        |
| 512x512   | uint8   | 4        | 0.00    | dodge         | scalar | 0.028616 | 0.000046   | 627.82x | -99.84%        |
| 512x512   | uint8   | 4        | 0.00    | dodge         | sse42  | 0.028616 | 0.000046   | 625.48x | -99.84%        |
| 512x512   | uint8   | 4        | 0.00    | dodge         | avx2   | 0.028616 | 0.000045   | 629.20x | -99.84%        |
| 512x512   | uint8   | 4        | 1.00    | dodge         | scalar | 0.028442 | 0.006487   | 4.38x   | -77.19%        |
| 512x512   | uint8   | 4        | 1.00    | dodge         | sse42  | 0.028442 | 0.000928   | 30.66x  | -96.74%        |
| 512x512   | uint8   | 4        | 1.00    | dodge         | avx2   | 0.028442 | 0.000775   | 36.70x  | -97.27%        |
| 512x512   | uint8   | 4        | 0.50    | addition      | scalar | 0.028275 | 0.007755   | 3.65x   | -72.57%        |
| 512x512   | uint8   | 4        | 0.50    | addition      | sse42  | 0.028275 | 0.001016   | 27.84x  | -96.41%        |
| 512x512   | uint8   | 4        | 0.50    | addition      | avx2   | 0.028275 | 0.000802   | 35.26x  | -97.16%        |
| 512x512   | uint8   | 4        | 0.00    | addition      | scalar | 0.028062 | 0.000048   | 582.61x | -99.83%        |
| 512x512   | uint8   | 4        | 0.00    | addition      | sse42  | 0.028062 | 0.000047   | 594.20x | -99.83%        |
| 512x512   | uint8   | 4        | 0.00    | addition      | avx2   | 0.028062 | 0.000048   | 579.87x | -99.83%        |
| 512x512   | uint8   | 4        | 1.00    | addition      | scalar | 0.027896 | 0.009102   | 3.06x   | -67.37%        |
| 512x512   | uint8   | 4        | 1.00    | addition      | sse42  | 0.027896 | 0.001013   | 27.53x  | -96.37%        |
| 512x512   | uint8   | 4        | 1.00    | addition      | avx2   | 0.027896 | 0.000796   | 35.04x  | -97.15%        |
| 512x512   | uint8   | 4        | 0.50    | darken_only   | scalar | 0.028096 | 0.006905   | 4.07x   | -75.42%        |
| 512x512   | uint8   | 4        | 0.50    | darken_only   | sse42  | 0.028096 | 0.000757   | 37.11x  | -97.31%        |
| 512x512   | uint8   | 4        | 0.50    | darken_only   | avx2   | 0.028096 | 0.000758   | 37.07x  | -97.30%        |
| 512x512   | uint8   | 4        | 0.00    | darken_only   | scalar | 0.027982 | 0.000052   | 535.53x | -99.81%        |
| 512x512   | uint8   | 4        | 0.00    | darken_only   | sse42  | 0.027982 | 0.000047   | 594.02x | -99.83%        |
| 512x512   | uint8   | 4        | 0.00    | darken_only   | avx2   | 0.027982 | 0.000048   | 585.64x | -99.83%        |
| 512x512   | uint8   | 4        | 1.00    | darken_only   | scalar | 0.027881 | 0.006917   | 4.03x   | -75.19%        |
| 512x512   | uint8   | 4        | 1.00    | darken_only   | sse42  | 0.027881 | 0.000753   | 37.02x  | -97.30%        |
| 512x512   | uint8   | 4        | 1.00    | darken_only   | avx2   | 0.027881 | 0.000756   | 36.87x  | -97.29%        |
| 512x512   | uint8   | 4        | 0.50    | multiply      | scalar | 0.028184 | 0.006278   | 4.49x   | -77.73%        |
| 512x512   | uint8   | 4        | 0.50    | multiply      | sse42  | 0.028184 | 0.000779   | 36.16x  | -97.23%        |
| 512x512   | uint8   | 4        | 0.50    | multiply      | avx2   | 0.028184 | 0.000748   | 37.65x  | -97.34%        |
| 512x512   | uint8   | 4        | 0.00    | multiply      | scalar | 0.028292 | 0.000053   | 533.84x | -99.81%        |
| 512x512   | uint8   | 4        | 0.00    | multiply      | sse42  | 0.028292 | 0.000054   | 525.15x | -99.81%        |
| 512x512   | uint8   | 4        | 0.00    | multiply      | avx2   | 0.028292 | 0.000045   | 631.53x | -99.84%        |
| 512x512   | uint8   | 4        | 1.00    | multiply      | scalar | 0.028126 | 0.006241   | 4.51x   | -77.81%        |
| 512x512   | uint8   | 4        | 1.00    | multiply      | sse42  | 0.028126 | 0.000818   | 34.38x  | -97.09%        |
| 512x512   | uint8   | 4        | 1.00    | multiply      | avx2   | 0.028126 | 0.000749   | 37.55x  | -97.34%        |
| 512x512   | uint8   | 4        | 0.50    | hard_light    | scalar | 0.036658 | 0.010267   | 3.57x   | -71.99%        |
| 512x512   | uint8   | 4        | 0.50    | hard_light    | sse42  | 0.036658 | 0.000957   | 38.32x  | -97.39%        |
| 512x512   | uint8   | 4        | 0.50    | hard_light    | avx2   | 0.036658 | 0.000779   | 47.04x  | -97.87%        |
| 512x512   | uint8   | 4        | 0.00    | hard_light    | scalar | 0.039645 | 0.000049   | 802.07x | -99.88%        |
| 512x512   | uint8   | 4        | 0.00    | hard_light    | sse42  | 0.039645 | 0.000048   | 833.68x | -99.88%        |
| 512x512   | uint8   | 4        | 0.00    | hard_light    | avx2   | 0.039645 | 0.000047   | 840.31x | -99.88%        |
| 512x512   | uint8   | 4        | 1.00    | hard_light    | scalar | 0.036601 | 0.010226   | 3.58x   | -72.06%        |
| 512x512   | uint8   | 4        | 1.00    | hard_light    | sse42  | 0.036601 | 0.000999   | 36.63x  | -97.27%        |
| 512x512   | uint8   | 4        | 1.00    | hard_light    | avx2   | 0.036601 | 0.000785   | 46.64x  | -97.86%        |
| 512x512   | uint8   | 4        | 0.50    | difference    | scalar | 0.035155 | 0.006095   | 5.77x   | -82.66%        |
| 512x512   | uint8   | 4        | 0.50    | difference    | sse42  | 0.035155 | 0.000769   | 45.73x  | -97.81%        |
| 512x512   | uint8   | 4        | 0.50    | difference    | avx2   | 0.035155 | 0.000760   | 46.24x  | -97.84%        |
| 512x512   | uint8   | 4        | 0.00    | difference    | scalar | 0.035323 | 0.000048   | 739.06x | -99.86%        |
| 512x512   | uint8   | 4        | 0.00    | difference    | sse42  | 0.035323 | 0.000048   | 741.70x | -99.87%        |
| 512x512   | uint8   | 4        | 0.00    | difference    | avx2   | 0.035323 | 0.000057   | 623.11x | -99.84%        |
| 512x512   | uint8   | 4        | 1.00    | difference    | scalar | 0.035549 | 0.006143   | 5.79x   | -82.72%        |
| 512x512   | uint8   | 4        | 1.00    | difference    | sse42  | 0.035549 | 0.000771   | 46.13x  | -97.83%        |
| 512x512   | uint8   | 4        | 1.00    | difference    | avx2   | 0.035549 | 0.000757   | 46.93x  | -97.87%        |
| 512x512   | uint8   | 4        | 0.50    | subtract      | scalar | 0.028077 | 0.006016   | 4.67x   | -78.57%        |
| 512x512   | uint8   | 4        | 0.50    | subtract      | sse42  | 0.028077 | 0.001066   | 26.34x  | -96.20%        |
| 512x512   | uint8   | 4        | 0.50    | subtract      | avx2   | 0.028077 | 0.000798   | 35.16x  | -97.16%        |
| 512x512   | uint8   | 4        | 0.00    | subtract      | scalar | 0.027791 | 0.000048   | 578.83x | -99.83%        |
| 512x512   | uint8   | 4        | 0.00    | subtract      | sse42  | 0.027791 | 0.000060   | 461.17x | -99.78%        |
| 512x512   | uint8   | 4        | 0.00    | subtract      | avx2   | 0.027791 | 0.000051   | 548.31x | -99.82%        |
| 512x512   | uint8   | 4        | 1.00    | subtract      | scalar | 0.028086 | 0.005809   | 4.83x   | -79.32%        |
| 512x512   | uint8   | 4        | 1.00    | subtract      | sse42  | 0.028086 | 0.001060   | 26.50x  | -96.23%        |
| 512x512   | uint8   | 4        | 1.00    | subtract      | avx2   | 0.028086 | 0.000790   | 35.54x  | -97.19%        |
| 512x512   | uint8   | 4        | 0.50    | grain_extract | scalar | 0.028407 | 0.007457   | 3.81x   | -73.75%        |
| 512x512   | uint8   | 4        | 0.50    | grain_extract | sse42  | 0.028407 | 0.000842   | 33.72x  | -97.03%        |
| 512x512   | uint8   | 4        | 0.50    | grain_extract | avx2   | 0.028407 | 0.000769   | 36.94x  | -97.29%        |
| 512x512   | uint8   | 4        | 0.00    | grain_extract | scalar | 0.028795 | 0.000047   | 612.72x | -99.84%        |
| 512x512   | uint8   | 4        | 0.00    | grain_extract | sse42  | 0.028795 | 0.000045   | 636.01x | -99.84%        |
| 512x512   | uint8   | 4        | 0.00    | grain_extract | avx2   | 0.028795 | 0.000045   | 634.32x | -99.84%        |
| 512x512   | uint8   | 4        | 1.00    | grain_extract | scalar | 0.028413 | 0.007481   | 3.80x   | -73.67%        |
| 512x512   | uint8   | 4        | 1.00    | grain_extract | sse42  | 0.028413 | 0.000844   | 33.68x  | -97.03%        |
| 512x512   | uint8   | 4        | 1.00    | grain_extract | avx2   | 0.028413 | 0.000767   | 37.06x  | -97.30%        |
| 512x512   | uint8   | 4        | 0.50    | grain_merge   | scalar | 0.028391 | 0.007483   | 3.79x   | -73.64%        |
| 512x512   | uint8   | 4        | 0.50    | grain_merge   | sse42  | 0.028391 | 0.000845   | 33.60x  | -97.02%        |
| 512x512   | uint8   | 4        | 0.50    | grain_merge   | avx2   | 0.028391 | 0.000764   | 37.14x  | -97.31%        |
| 512x512   | uint8   | 4        | 0.00    | grain_merge   | scalar | 0.028487 | 0.000057   | 501.63x | -99.80%        |
| 512x512   | uint8   | 4        | 0.00    | grain_merge   | sse42  | 0.028487 | 0.000048   | 588.05x | -99.83%        |
| 512x512   | uint8   | 4        | 0.00    | grain_merge   | avx2   | 0.028487 | 0.000047   | 602.06x | -99.83%        |
| 512x512   | uint8   | 4        | 1.00    | grain_merge   | scalar | 0.028637 | 0.007507   | 3.81x   | -73.79%        |
| 512x512   | uint8   | 4        | 1.00    | grain_merge   | sse42  | 0.028637 | 0.000834   | 34.34x  | -97.09%        |
| 512x512   | uint8   | 4        | 1.00    | grain_merge   | avx2   | 0.028637 | 0.000766   | 37.39x  | -97.33%        |
| 512x512   | uint8   | 4        | 0.50    | divide        | scalar | 0.029055 | 0.006269   | 4.63x   | -78.42%        |
| 512x512   | uint8   | 4        | 0.50    | divide        | sse42  | 0.029055 | 0.000879   | 33.05x  | -96.97%        |
| 512x512   | uint8   | 4        | 0.50    | divide        | avx2   | 0.029055 | 0.000772   | 37.64x  | -97.34%        |
| 512x512   | uint8   | 4        | 0.00    | divide        | scalar | 0.028954 | 0.000047   | 620.31x | -99.84%        |
| 512x512   | uint8   | 4        | 0.00    | divide        | sse42  | 0.028954 | 0.000045   | 641.05x | -99.84%        |
| 512x512   | uint8   | 4        | 0.00    | divide        | avx2   | 0.028954 | 0.000045   | 640.05x | -99.84%        |
| 512x512   | uint8   | 4        | 1.00    | divide        | scalar | 0.028891 | 0.006386   | 4.52x   | -77.90%        |
| 512x512   | uint8   | 4        | 1.00    | divide        | sse42  | 0.028891 | 0.000877   | 32.93x  | -96.96%        |
| 512x512   | uint8   | 4        | 1.00    | divide        | avx2   | 0.028891 | 0.000760   | 38.02x  | -97.37%        |
| 512x512   | uint8   | 4        | 0.50    | overlay       | scalar | 0.034941 | 0.009920   | 3.52x   | -71.61%        |
| 512x512   | uint8   | 4        | 0.50    | overlay       | sse42  | 0.034941 | 0.000921   | 37.96x  | -97.37%        |
| 512x512   | uint8   | 4        | 0.50    | overlay       | avx2   | 0.034941 | 0.000773   | 45.21x  | -97.79%        |
| 512x512   | uint8   | 4        | 0.00    | overlay       | scalar | 0.035762 | 0.000048   | 737.65x | -99.86%        |
| 512x512   | uint8   | 4        | 0.00    | overlay       | sse42  | 0.035762 | 0.000064   | 561.37x | -99.82%        |
| 512x512   | uint8   | 4        | 0.00    | overlay       | avx2   | 0.035762 | 0.000048   | 748.36x | -99.87%        |
| 512x512   | uint8   | 4        | 1.00    | overlay       | scalar | 0.035436 | 0.009979   | 3.55x   | -71.84%        |
| 512x512   | uint8   | 4        | 1.00    | overlay       | sse42  | 0.035436 | 0.000909   | 38.96x  | -97.43%        |
| 512x512   | uint8   | 4        | 1.00    | overlay       | avx2   | 0.035436 | 0.000775   | 45.73x  | -97.81%        |
| 512x512   | float32 | 3        | 0.50    | normal        | scalar | 0.028967 | 0.002337   | 12.39x  | -91.93%        |
| 512x512   | float32 | 3        | 0.50    | normal        | sse42  | 0.028967 | 0.000900   | 32.18x  | -96.89%        |
| 512x512   | float32 | 3        | 0.50    | normal        | avx2   | 0.028967 | 0.000590   | 49.10x  | -97.96%        |
| 512x512   | float32 | 3        | 0.00    | normal        | scalar | 0.028648 | 0.000663   | 43.22x  | -97.69%        |
| 512x512   | float32 | 3        | 0.00    | normal        | sse42  | 0.028648 | 0.000369   | 77.59x  | -98.71%        |
| 512x512   | float32 | 3        | 0.00    | normal        | avx2   | 0.028648 | 0.000361   | 79.33x  | -98.74%        |
| 512x512   | float32 | 3        | 1.00    | normal        | scalar | 0.028377 | 0.000635   | 44.70x  | -97.76%        |
| 512x512   | float32 | 3        | 1.00    | normal        | sse42  | 0.028377 | 0.000359   | 79.05x  | -98.74%        |
| 512x512   | float32 | 3        | 1.00    | normal        | avx2   | 0.028377 | 0.000352   | 80.55x  | -98.76%        |
| 512x512   | float32 | 3        | 0.50    | soft_light    | scalar | 0.041450 | 0.002810   | 14.75x  | -93.22%        |
| 512x512   | float32 | 3        | 0.50    | soft_light    | sse42  | 0.041450 | 0.001039   | 39.91x  | -97.49%        |
| 512x512   | float32 | 3        | 0.50    | soft_light    | avx2   | 0.041450 | 0.000736   | 56.29x  | -98.22%        |
| 512x512   | float32 | 3        | 0.00    | soft_light    | scalar | 0.039327 | 0.000651   | 60.38x  | -98.34%        |
| 512x512   | float32 | 3        | 0.00    | soft_light    | sse42  | 0.039327 | 0.000346   | 113.60x | -99.12%        |
| 512x512   | float32 | 3        | 0.00    | soft_light    | avx2   | 0.039327 | 0.000354   | 110.99x | -99.10%        |
| 512x512   | float32 | 3        | 1.00    | soft_light    | scalar | 0.039581 | 0.002796   | 14.16x  | -92.94%        |
| 512x512   | float32 | 3        | 1.00    | soft_light    | sse42  | 0.039581 | 0.001058   | 37.41x  | -97.33%        |
| 512x512   | float32 | 3        | 1.00    | soft_light    | avx2   | 0.039581 | 0.000767   | 51.60x  | -98.06%        |
| 512x512   | float32 | 3        | 0.50    | lighten_only  | scalar | 0.034206 | 0.003333   | 10.26x  | -90.26%        |
| 512x512   | float32 | 3        | 0.50    | lighten_only  | sse42  | 0.034206 | 0.000886   | 38.63x  | -97.41%        |
| 512x512   | float32 | 3        | 0.50    | lighten_only  | avx2   | 0.034206 | 0.000740   | 46.22x  | -97.84%        |
| 512x512   | float32 | 3        | 0.00    | lighten_only  | scalar | 0.032857 | 0.000675   | 48.66x  | -97.94%        |
| 512x512   | float32 | 3        | 0.00    | lighten_only  | sse42  | 0.032857 | 0.000367   | 89.61x  | -98.88%        |
| 512x512   | float32 | 3        | 0.00    | lighten_only  | avx2   | 0.032857 | 0.000371   | 88.54x  | -98.87%        |
| 512x512   | float32 | 3        | 1.00    | lighten_only  | scalar | 0.034250 | 0.003287   | 10.42x  | -90.40%        |
| 512x512   | float32 | 3        | 1.00    | lighten_only  | sse42  | 0.034250 | 0.000873   | 39.22x  | -97.45%        |
| 512x512   | float32 | 3        | 1.00    | lighten_only  | avx2   | 0.034250 | 0.000729   | 47.01x  | -97.87%        |
| 512x512   | float32 | 3        | 0.50    | screen        | scalar | 0.034032 | 0.002649   | 12.85x  | -92.22%        |
| 512x512   | float32 | 3        | 0.50    | screen        | sse42  | 0.034032 | 0.000945   | 36.02x  | -97.22%        |
| 512x512   | float32 | 3        | 0.50    | screen        | avx2   | 0.034032 | 0.000760   | 44.79x  | -97.77%        |
| 512x512   | float32 | 3        | 0.00    | screen        | scalar | 0.033928 | 0.000676   | 50.15x  | -98.01%        |
| 512x512   | float32 | 3        | 0.00    | screen        | sse42  | 0.033928 | 0.000370   | 91.76x  | -98.91%        |
| 512x512   | float32 | 3        | 0.00    | screen        | avx2   | 0.033928 | 0.000368   | 92.32x  | -98.92%        |
| 512x512   | float32 | 3        | 1.00    | screen        | scalar | 0.033989 | 0.002579   | 13.18x  | -92.41%        |
| 512x512   | float32 | 3        | 1.00    | screen        | sse42  | 0.033989 | 0.000966   | 35.18x  | -97.16%        |
| 512x512   | float32 | 3        | 1.00    | screen        | avx2   | 0.033989 | 0.000768   | 44.27x  | -97.74%        |
| 512x512   | float32 | 3        | 0.50    | dodge         | scalar | 0.034160 | 0.003991   | 8.56x   | -88.32%        |
| 512x512   | float32 | 3        | 0.50    | dodge         | sse42  | 0.034160 | 0.001113   | 30.69x  | -96.74%        |
| 512x512   | float32 | 3        | 0.50    | dodge         | avx2   | 0.034160 | 0.000773   | 44.20x  | -97.74%        |
| 512x512   | float32 | 3        | 0.00    | dodge         | scalar | 0.035695 | 0.000658   | 54.23x  | -98.16%        |
| 512x512   | float32 | 3        | 0.00    | dodge         | sse42  | 0.035695 | 0.000367   | 97.25x  | -98.97%        |
| 512x512   | float32 | 3        | 0.00    | dodge         | avx2   | 0.035695 | 0.000374   | 95.53x  | -98.95%        |
| 512x512   | float32 | 3        | 1.00    | dodge         | scalar | 0.033754 | 0.002847   | 11.86x  | -91.57%        |
| 512x512   | float32 | 3        | 1.00    | dodge         | sse42  | 0.033754 | 0.001086   | 31.07x  | -96.78%        |
| 512x512   | float32 | 3        | 1.00    | dodge         | avx2   | 0.033754 | 0.000751   | 44.92x  | -97.77%        |
| 512x512   | float32 | 3        | 0.50    | addition      | scalar | 0.032977 | 0.006577   | 5.01x   | -80.06%        |
| 512x512   | float32 | 3        | 0.50    | addition      | sse42  | 0.032977 | 0.000914   | 36.09x  | -97.23%        |
| 512x512   | float32 | 3        | 0.50    | addition      | avx2   | 0.032977 | 0.000750   | 43.97x  | -97.73%        |
| 512x512   | float32 | 3        | 0.00    | addition      | scalar | 0.032895 | 0.000661   | 49.74x  | -97.99%        |
| 512x512   | float32 | 3        | 0.00    | addition      | sse42  | 0.032895 | 0.000367   | 89.60x  | -98.88%        |
| 512x512   | float32 | 3        | 0.00    | addition      | avx2   | 0.032895 | 0.000368   | 89.41x  | -98.88%        |
| 512x512   | float32 | 3        | 1.00    | addition      | scalar | 0.033105 | 0.009584   | 3.45x   | -71.05%        |
| 512x512   | float32 | 3        | 1.00    | addition      | sse42  | 0.033105 | 0.000915   | 36.19x  | -97.24%        |
| 512x512   | float32 | 3        | 1.00    | addition      | avx2   | 0.033105 | 0.000758   | 43.65x  | -97.71%        |
| 512x512   | float32 | 3        | 0.50    | darken_only   | scalar | 0.033081 | 0.003259   | 10.15x  | -90.15%        |
| 512x512   | float32 | 3        | 0.50    | darken_only   | sse42  | 0.033081 | 0.000869   | 38.06x  | -97.37%        |
| 512x512   | float32 | 3        | 0.50    | darken_only   | avx2   | 0.033081 | 0.000720   | 45.94x  | -97.82%        |
| 512x512   | float32 | 3        | 0.00    | darken_only   | scalar | 0.033263 | 0.000676   | 49.23x  | -97.97%        |
| 512x512   | float32 | 3        | 0.00    | darken_only   | sse42  | 0.033263 | 0.000364   | 91.30x  | -98.90%        |
| 512x512   | float32 | 3        | 0.00    | darken_only   | avx2   | 0.033263 | 0.000363   | 91.62x  | -98.91%        |
| 512x512   | float32 | 3        | 1.00    | darken_only   | scalar | 0.033051 | 0.003278   | 10.08x  | -90.08%        |
| 512x512   | float32 | 3        | 1.00    | darken_only   | sse42  | 0.033051 | 0.000889   | 37.19x  | -97.31%        |
| 512x512   | float32 | 3        | 1.00    | darken_only   | avx2   | 0.033051 | 0.000768   | 43.01x  | -97.68%        |
| 512x512   | float32 | 3        | 0.50    | multiply      | scalar | 0.033713 | 0.002520   | 13.38x  | -92.52%        |
| 512x512   | float32 | 3        | 0.50    | multiply      | sse42  | 0.033713 | 0.000889   | 37.91x  | -97.36%        |
| 512x512   | float32 | 3        | 0.50    | multiply      | avx2   | 0.033713 | 0.000723   | 46.60x  | -97.85%        |
| 512x512   | float32 | 3        | 0.00    | multiply      | scalar | 0.033236 | 0.000652   | 50.97x  | -98.04%        |
| 512x512   | float32 | 3        | 0.00    | multiply      | sse42  | 0.033236 | 0.000350   | 95.03x  | -98.95%        |
| 512x512   | float32 | 3        | 0.00    | multiply      | avx2   | 0.033236 | 0.000361   | 92.01x  | -98.91%        |
| 512x512   | float32 | 3        | 1.00    | multiply      | scalar | 0.033668 | 0.002554   | 13.18x  | -92.41%        |
| 512x512   | float32 | 3        | 1.00    | multiply      | sse42  | 0.033668 | 0.000915   | 36.78x  | -97.28%        |
| 512x512   | float32 | 3        | 1.00    | multiply      | avx2   | 0.033668 | 0.000745   | 45.20x  | -97.79%        |
| 512x512   | float32 | 3        | 0.50    | hard_light    | scalar | 0.042600 | 0.007445   | 5.72x   | -82.52%        |
| 512x512   | float32 | 3        | 0.50    | hard_light    | sse42  | 0.042600 | 0.001188   | 35.85x  | -97.21%        |
| 512x512   | float32 | 3        | 0.50    | hard_light    | avx2   | 0.042600 | 0.000752   | 56.65x  | -98.23%        |
| 512x512   | float32 | 3        | 0.00    | hard_light    | scalar | 0.041654 | 0.000671   | 62.10x  | -98.39%        |
| 512x512   | float32 | 3        | 0.00    | hard_light    | sse42  | 0.041654 | 0.000375   | 111.18x | -99.10%        |
| 512x512   | float32 | 3        | 0.00    | hard_light    | avx2   | 0.041654 | 0.000371   | 112.33x | -99.11%        |
| 512x512   | float32 | 3        | 1.00    | hard_light    | scalar | 0.041561 | 0.007458   | 5.57x   | -82.06%        |
| 512x512   | float32 | 3        | 1.00    | hard_light    | sse42  | 0.041561 | 0.001099   | 37.82x  | -97.36%        |
| 512x512   | float32 | 3        | 1.00    | hard_light    | avx2   | 0.041561 | 0.000758   | 54.84x  | -98.18%        |
| 512x512   | float32 | 3        | 0.50    | difference    | scalar | 0.040430 | 0.002542   | 15.91x  | -93.71%        |
| 512x512   | float32 | 3        | 0.50    | difference    | sse42  | 0.040430 | 0.000916   | 44.14x  | -97.73%        |
| 512x512   | float32 | 3        | 0.50    | difference    | avx2   | 0.040430 | 0.000757   | 53.41x  | -98.13%        |
| 512x512   | float32 | 3        | 0.00    | difference    | scalar | 0.040600 | 0.000668   | 60.78x  | -98.35%        |
| 512x512   | float32 | 3        | 0.00    | difference    | sse42  | 0.040600 | 0.000380   | 106.77x | -99.06%        |
| 512x512   | float32 | 3        | 0.00    | difference    | avx2   | 0.040600 | 0.000383   | 106.07x | -99.06%        |
| 512x512   | float32 | 3        | 1.00    | difference    | scalar | 0.040088 | 0.002555   | 15.69x  | -93.63%        |
| 512x512   | float32 | 3        | 1.00    | difference    | sse42  | 0.040088 | 0.000927   | 43.25x  | -97.69%        |
| 512x512   | float32 | 3        | 1.00    | difference    | avx2   | 0.040088 | 0.000768   | 52.19x  | -98.08%        |
| 512x512   | float32 | 3        | 0.50    | subtract      | scalar | 0.033341 | 0.003122   | 10.68x  | -90.64%        |
| 512x512   | float32 | 3        | 0.50    | subtract      | sse42  | 0.033341 | 0.000951   | 35.06x  | -97.15%        |
| 512x512   | float32 | 3        | 0.50    | subtract      | avx2   | 0.033341 | 0.000752   | 44.31x  | -97.74%        |
| 512x512   | float32 | 3        | 0.00    | subtract      | scalar | 0.032981 | 0.000673   | 49.03x  | -97.96%        |
| 512x512   | float32 | 3        | 0.00    | subtract      | sse42  | 0.032981 | 0.000369   | 89.46x  | -98.88%        |
| 512x512   | float32 | 3        | 0.00    | subtract      | avx2   | 0.032981 | 0.000360   | 91.63x  | -98.91%        |
| 512x512   | float32 | 3        | 1.00    | subtract      | scalar | 0.033186 | 0.003057   | 10.86x  | -90.79%        |
| 512x512   | float32 | 3        | 1.00    | subtract      | sse42  | 0.033186 | 0.000969   | 34.25x  | -97.08%        |
| 512x512   | float32 | 3        | 1.00    | subtract      | avx2   | 0.033186 | 0.000771   | 43.05x  | -97.68%        |
| 512x512   | float32 | 3        | 0.50    | grain_extract | scalar | 0.033812 | 0.004354   | 7.77x   | -87.12%        |
| 512x512   | float32 | 3        | 0.50    | grain_extract | sse42  | 0.033812 | 0.000959   | 35.27x  | -97.16%        |
| 512x512   | float32 | 3        | 0.50    | grain_extract | avx2   | 0.033812 | 0.000747   | 45.27x  | -97.79%        |
| 512x512   | float32 | 3        | 0.00    | grain_extract | scalar | 0.033881 | 0.000651   | 52.08x  | -98.08%        |
| 512x512   | float32 | 3        | 0.00    | grain_extract | sse42  | 0.033881 | 0.000347   | 97.62x  | -98.98%        |
| 512x512   | float32 | 3        | 0.00    | grain_extract | avx2   | 0.033881 | 0.000357   | 94.96x  | -98.95%        |
| 512x512   | float32 | 3        | 1.00    | grain_extract | scalar | 0.033576 | 0.004427   | 7.58x   | -86.82%        |
| 512x512   | float32 | 3        | 1.00    | grain_extract | sse42  | 0.033576 | 0.000960   | 34.97x  | -97.14%        |
| 512x512   | float32 | 3        | 1.00    | grain_extract | avx2   | 0.033576 | 0.000749   | 44.80x  | -97.77%        |
| 512x512   | float32 | 3        | 0.50    | grain_merge   | scalar | 0.033513 | 0.004338   | 7.73x   | -87.06%        |
| 512x512   | float32 | 3        | 0.50    | grain_merge   | sse42  | 0.033513 | 0.000955   | 35.10x  | -97.15%        |
| 512x512   | float32 | 3        | 0.50    | grain_merge   | avx2   | 0.033513 | 0.000757   | 44.26x  | -97.74%        |
| 512x512   | float32 | 3        | 0.00    | grain_merge   | scalar | 0.033553 | 0.000656   | 51.13x  | -98.04%        |
| 512x512   | float32 | 3        | 0.00    | grain_merge   | sse42  | 0.033553 | 0.000357   | 94.09x  | -98.94%        |
| 512x512   | float32 | 3        | 0.00    | grain_merge   | avx2   | 0.033553 | 0.000358   | 93.82x  | -98.93%        |
| 512x512   | float32 | 3        | 1.00    | grain_merge   | scalar | 0.033600 | 0.004409   | 7.62x   | -86.88%        |
| 512x512   | float32 | 3        | 1.00    | grain_merge   | sse42  | 0.033600 | 0.000976   | 34.44x  | -97.10%        |
| 512x512   | float32 | 3        | 1.00    | grain_merge   | avx2   | 0.033600 | 0.000827   | 40.63x  | -97.54%        |
| 512x512   | float32 | 3        | 0.50    | divide        | scalar | 0.036122 | 0.002845   | 12.70x  | -92.12%        |
| 512x512   | float32 | 3        | 0.50    | divide        | sse42  | 0.036122 | 0.001063   | 33.98x  | -97.06%        |
| 512x512   | float32 | 3        | 0.50    | divide        | avx2   | 0.036122 | 0.000770   | 46.92x  | -97.87%        |
| 512x512   | float32 | 3        | 0.00    | divide        | scalar | 0.036017 | 0.000714   | 50.44x  | -98.02%        |
| 512x512   | float32 | 3        | 0.00    | divide        | sse42  | 0.036017 | 0.000436   | 82.52x  | -98.79%        |
| 512x512   | float32 | 3        | 0.00    | divide        | avx2   | 0.036017 | 0.000416   | 86.48x  | -98.84%        |
| 512x512   | float32 | 3        | 1.00    | divide        | scalar | 0.035087 | 0.002792   | 12.57x  | -92.04%        |
| 512x512   | float32 | 3        | 1.00    | divide        | sse42  | 0.035087 | 0.001074   | 32.67x  | -96.94%        |
| 512x512   | float32 | 3        | 1.00    | divide        | avx2   | 0.035087 | 0.000763   | 46.01x  | -97.83%        |
| 512x512   | float32 | 3        | 0.50    | overlay       | scalar | 0.040747 | 0.006828   | 5.97x   | -83.24%        |
| 512x512   | float32 | 3        | 0.50    | overlay       | sse42  | 0.040747 | 0.001013   | 40.24x  | -97.52%        |
| 512x512   | float32 | 3        | 0.50    | overlay       | avx2   | 0.040747 | 0.000737   | 55.32x  | -98.19%        |
| 512x512   | float32 | 3        | 0.00    | overlay       | scalar | 0.040638 | 0.000680   | 59.80x  | -98.33%        |
| 512x512   | float32 | 3        | 0.00    | overlay       | sse42  | 0.040638 | 0.000347   | 117.27x | -99.15%        |
| 512x512   | float32 | 3        | 0.00    | overlay       | avx2   | 0.040638 | 0.000356   | 114.01x | -99.12%        |
| 512x512   | float32 | 3        | 1.00    | overlay       | scalar | 0.039881 | 0.006863   | 5.81x   | -82.79%        |
| 512x512   | float32 | 3        | 1.00    | overlay       | sse42  | 0.039881 | 0.001019   | 39.12x  | -97.44%        |
| 512x512   | float32 | 3        | 1.00    | overlay       | avx2   | 0.039881 | 0.000737   | 54.08x  | -98.15%        |
| 512x512   | float32 | 4        | 0.50    | normal        | scalar | 0.021040 | 0.002725   | 7.72x   | -87.05%        |
| 512x512   | float32 | 4        | 0.50    | normal        | sse42  | 0.021040 | 0.000701   | 30.00x  | -96.67%        |
| 512x512   | float32 | 4        | 0.50    | normal        | avx2   | 0.021040 | 0.000701   | 30.00x  | -96.67%        |
| 512x512   | float32 | 4        | 0.00    | normal        | scalar | 0.020639 | 0.000547   | 37.71x  | -97.35%        |
| 512x512   | float32 | 4        | 0.00    | normal        | sse42  | 0.020639 | 0.000290   | 71.17x  | -98.59%        |
| 512x512   | float32 | 4        | 0.00    | normal        | avx2   | 0.020639 | 0.000338   | 61.03x  | -98.36%        |
| 512x512   | float32 | 4        | 1.00    | normal        | scalar | 0.020831 | 0.002729   | 7.63x   | -86.90%        |
| 512x512   | float32 | 4        | 1.00    | normal        | sse42  | 0.020831 | 0.000705   | 29.57x  | -96.62%        |
| 512x512   | float32 | 4        | 1.00    | normal        | avx2   | 0.020831 | 0.000709   | 29.38x  | -96.60%        |
| 512x512   | float32 | 4        | 0.50    | soft_light    | scalar | 0.032395 | 0.003195   | 10.14x  | -90.14%        |
| 512x512   | float32 | 4        | 0.50    | soft_light    | sse42  | 0.032395 | 0.000815   | 39.77x  | -97.49%        |
| 512x512   | float32 | 4        | 0.50    | soft_light    | avx2   | 0.032395 | 0.000842   | 38.49x  | -97.40%        |
| 512x512   | float32 | 4        | 0.00    | soft_light    | scalar | 0.031700 | 0.000581   | 54.58x  | -98.17%        |
| 512x512   | float32 | 4        | 0.00    | soft_light    | sse42  | 0.031700 | 0.000322   | 98.35x  | -98.98%        |
| 512x512   | float32 | 4        | 0.00    | soft_light    | avx2   | 0.031700 | 0.000288   | 110.16x | -99.09%        |
| 512x512   | float32 | 4        | 1.00    | soft_light    | scalar | 0.031587 | 0.003166   | 9.98x   | -89.98%        |
| 512x512   | float32 | 4        | 1.00    | soft_light    | sse42  | 0.031587 | 0.000752   | 41.99x  | -97.62%        |
| 512x512   | float32 | 4        | 1.00    | soft_light    | avx2   | 0.031587 | 0.000782   | 40.39x  | -97.52%        |
| 512x512   | float32 | 4        | 0.50    | lighten_only  | scalar | 0.024621 | 0.003406   | 7.23x   | -86.16%        |
| 512x512   | float32 | 4        | 0.50    | lighten_only  | sse42  | 0.024621 | 0.000733   | 33.58x  | -97.02%        |
| 512x512   | float32 | 4        | 0.50    | lighten_only  | avx2   | 0.024621 | 0.000792   | 31.10x  | -96.78%        |
| 512x512   | float32 | 4        | 0.00    | lighten_only  | scalar | 0.024517 | 0.000548   | 44.71x  | -97.76%        |
| 512x512   | float32 | 4        | 0.00    | lighten_only  | sse42  | 0.024517 | 0.000290   | 84.61x  | -98.82%        |
| 512x512   | float32 | 4        | 0.00    | lighten_only  | avx2   | 0.024517 | 0.000291   | 84.28x  | -98.81%        |
| 512x512   | float32 | 4        | 1.00    | lighten_only  | scalar | 0.025115 | 0.003398   | 7.39x   | -86.47%        |
| 512x512   | float32 | 4        | 1.00    | lighten_only  | sse42  | 0.025115 | 0.000724   | 34.67x  | -97.12%        |
| 512x512   | float32 | 4        | 1.00    | lighten_only  | avx2   | 0.025115 | 0.000779   | 32.25x  | -96.90%        |
| 512x512   | float32 | 4        | 0.50    | screen        | scalar | 0.025578 | 0.003039   | 8.42x   | -88.12%        |
| 512x512   | float32 | 4        | 0.50    | screen        | sse42  | 0.025578 | 0.000762   | 33.57x  | -97.02%        |
| 512x512   | float32 | 4        | 0.50    | screen        | avx2   | 0.025578 | 0.000790   | 32.40x  | -96.91%        |
| 512x512   | float32 | 4        | 0.00    | screen        | scalar | 0.025610 | 0.000556   | 46.08x  | -97.83%        |
| 512x512   | float32 | 4        | 0.00    | screen        | sse42  | 0.025610 | 0.000292   | 87.75x  | -98.86%        |
| 512x512   | float32 | 4        | 0.00    | screen        | avx2   | 0.025610 | 0.000298   | 85.96x  | -98.84%        |
| 512x512   | float32 | 4        | 1.00    | screen        | scalar | 0.025825 | 0.002980   | 8.67x   | -88.46%        |
| 512x512   | float32 | 4        | 1.00    | screen        | sse42  | 0.025825 | 0.000732   | 35.26x  | -97.16%        |
| 512x512   | float32 | 4        | 1.00    | screen        | avx2   | 0.025825 | 0.000749   | 34.47x  | -97.10%        |
| 512x512   | float32 | 4        | 0.50    | dodge         | scalar | 0.025621 | 0.003324   | 7.71x   | -87.03%        |
| 512x512   | float32 | 4        | 0.50    | dodge         | sse42  | 0.025621 | 0.000891   | 28.76x  | -96.52%        |
| 512x512   | float32 | 4        | 0.50    | dodge         | avx2   | 0.025621 | 0.000802   | 31.93x  | -96.87%        |
| 512x512   | float32 | 4        | 0.00    | dodge         | scalar | 0.025462 | 0.000555   | 45.84x  | -97.82%        |
| 512x512   | float32 | 4        | 0.00    | dodge         | sse42  | 0.025462 | 0.000282   | 90.29x  | -98.89%        |
| 512x512   | float32 | 4        | 0.00    | dodge         | avx2   | 0.025462 | 0.000288   | 88.29x  | -98.87%        |
| 512x512   | float32 | 4        | 1.00    | dodge         | scalar | 0.025647 | 0.003301   | 7.77x   | -87.13%        |
| 512x512   | float32 | 4        | 1.00    | dodge         | sse42  | 0.025647 | 0.000867   | 29.59x  | -96.62%        |
| 512x512   | float32 | 4        | 1.00    | dodge         | avx2   | 0.025647 | 0.000768   | 33.40x  | -97.01%        |
| 512x512   | float32 | 4        | 0.50    | addition      | scalar | 0.024919 | 0.005641   | 4.42x   | -77.36%        |
| 512x512   | float32 | 4        | 0.50    | addition      | sse42  | 0.024919 | 0.000795   | 31.34x  | -96.81%        |
| 512x512   | float32 | 4        | 0.50    | addition      | avx2   | 0.024919 | 0.000822   | 30.31x  | -96.70%        |
| 512x512   | float32 | 4        | 0.00    | addition      | scalar | 0.024599 | 0.000611   | 40.25x  | -97.52%        |
| 512x512   | float32 | 4        | 0.00    | addition      | sse42  | 0.024599 | 0.000282   | 87.22x  | -98.85%        |
| 512x512   | float32 | 4        | 0.00    | addition      | avx2   | 0.024599 | 0.000435   | 56.51x  | -98.23%        |
| 512x512   | float32 | 4        | 1.00    | addition      | scalar | 0.025330 | 0.007265   | 3.49x   | -71.32%        |
| 512x512   | float32 | 4        | 1.00    | addition      | sse42  | 0.025330 | 0.000805   | 31.45x  | -96.82%        |
| 512x512   | float32 | 4        | 1.00    | addition      | avx2   | 0.025330 | 0.000819   | 30.94x  | -96.77%        |
| 512x512   | float32 | 4        | 0.50    | darken_only   | scalar | 0.025205 | 0.003415   | 7.38x   | -86.45%        |
| 512x512   | float32 | 4        | 0.50    | darken_only   | sse42  | 0.025205 | 0.000722   | 34.89x  | -97.13%        |
| 512x512   | float32 | 4        | 0.50    | darken_only   | avx2   | 0.025205 | 0.000766   | 32.91x  | -96.96%        |
| 512x512   | float32 | 4        | 0.00    | darken_only   | scalar | 0.024833 | 0.000559   | 44.42x  | -97.75%        |
| 512x512   | float32 | 4        | 0.00    | darken_only   | sse42  | 0.024833 | 0.000285   | 87.14x  | -98.85%        |
| 512x512   | float32 | 4        | 0.00    | darken_only   | avx2   | 0.024833 | 0.000286   | 86.86x  | -98.85%        |
| 512x512   | float32 | 4        | 1.00    | darken_only   | scalar | 0.025021 | 0.003435   | 7.28x   | -86.27%        |
| 512x512   | float32 | 4        | 1.00    | darken_only   | sse42  | 0.025021 | 0.000763   | 32.81x  | -96.95%        |
| 512x512   | float32 | 4        | 1.00    | darken_only   | avx2   | 0.025021 | 0.000788   | 31.74x  | -96.85%        |
| 512x512   | float32 | 4        | 0.50    | multiply      | scalar | 0.025131 | 0.002882   | 8.72x   | -88.53%        |
| 512x512   | float32 | 4        | 0.50    | multiply      | sse42  | 0.025131 | 0.000730   | 34.43x  | -97.10%        |
| 512x512   | float32 | 4        | 0.50    | multiply      | avx2   | 0.025131 | 0.000772   | 32.57x  | -96.93%        |
| 512x512   | float32 | 4        | 0.00    | multiply      | scalar | 0.025260 | 0.000548   | 46.12x  | -97.83%        |
| 512x512   | float32 | 4        | 0.00    | multiply      | sse42  | 0.025260 | 0.000295   | 85.62x  | -98.83%        |
| 512x512   | float32 | 4        | 0.00    | multiply      | avx2   | 0.025260 | 0.000293   | 86.15x  | -98.84%        |
| 512x512   | float32 | 4        | 1.00    | multiply      | scalar | 0.025151 | 0.002891   | 8.70x   | -88.51%        |
| 512x512   | float32 | 4        | 1.00    | multiply      | sse42  | 0.025151 | 0.000771   | 32.61x  | -96.93%        |
| 512x512   | float32 | 4        | 1.00    | multiply      | avx2   | 0.025151 | 0.000802   | 31.37x  | -96.81%        |
| 512x512   | float32 | 4        | 0.50    | hard_light    | scalar | 0.033231 | 0.007697   | 4.32x   | -76.84%        |
| 512x512   | float32 | 4        | 0.50    | hard_light    | sse42  | 0.033231 | 0.000904   | 36.75x  | -97.28%        |
| 512x512   | float32 | 4        | 0.50    | hard_light    | avx2   | 0.033231 | 0.000790   | 42.05x  | -97.62%        |
| 512x512   | float32 | 4        | 0.00    | hard_light    | scalar | 0.033613 | 0.000578   | 58.16x  | -98.28%        |
| 512x512   | float32 | 4        | 0.00    | hard_light    | sse42  | 0.033613 | 0.000285   | 117.78x | -99.15%        |
| 512x512   | float32 | 4        | 0.00    | hard_light    | avx2   | 0.033613 | 0.000341   | 98.65x  | -98.99%        |
| 512x512   | float32 | 4        | 1.00    | hard_light    | scalar | 0.033492 | 0.007699   | 4.35x   | -77.01%        |
| 512x512   | float32 | 4        | 1.00    | hard_light    | sse42  | 0.033492 | 0.000908   | 36.87x  | -97.29%        |
| 512x512   | float32 | 4        | 1.00    | hard_light    | avx2   | 0.033492 | 0.000766   | 43.71x  | -97.71%        |
| 512x512   | float32 | 4        | 0.50    | difference    | scalar | 0.032128 | 0.002905   | 11.06x  | -90.96%        |
| 512x512   | float32 | 4        | 0.50    | difference    | sse42  | 0.032128 | 0.000724   | 44.39x  | -97.75%        |
| 512x512   | float32 | 4        | 0.50    | difference    | avx2   | 0.032128 | 0.000759   | 42.34x  | -97.64%        |
| 512x512   | float32 | 4        | 0.00    | difference    | scalar | 0.031976 | 0.000551   | 58.03x  | -98.28%        |
| 512x512   | float32 | 4        | 0.00    | difference    | sse42  | 0.031976 | 0.000280   | 114.02x | -99.12%        |
| 512x512   | float32 | 4        | 0.00    | difference    | avx2   | 0.031976 | 0.000373   | 85.81x  | -98.83%        |
| 512x512   | float32 | 4        | 1.00    | difference    | scalar | 0.032505 | 0.002888   | 11.25x  | -91.11%        |
| 512x512   | float32 | 4        | 1.00    | difference    | sse42  | 0.032505 | 0.000731   | 44.49x  | -97.75%        |
| 512x512   | float32 | 4        | 1.00    | difference    | avx2   | 0.032505 | 0.000762   | 42.64x  | -97.65%        |
| 512x512   | float32 | 4        | 0.50    | subtract      | scalar | 0.024697 | 0.003737   | 6.61x   | -84.87%        |
| 512x512   | float32 | 4        | 0.50    | subtract      | sse42  | 0.024697 | 0.000821   | 30.08x  | -96.68%        |
| 512x512   | float32 | 4        | 0.50    | subtract      | avx2   | 0.024697 | 0.000898   | 27.49x  | -96.36%        |
| 512x512   | float32 | 4        | 0.00    | subtract      | scalar | 0.024910 | 0.000584   | 42.64x  | -97.65%        |
| 512x512   | float32 | 4        | 0.00    | subtract      | sse42  | 0.024910 | 0.000325   | 76.62x  | -98.69%        |
| 512x512   | float32 | 4        | 0.00    | subtract      | avx2   | 0.024910 | 0.000289   | 86.18x  | -98.84%        |
| 512x512   | float32 | 4        | 1.00    | subtract      | scalar | 0.025036 | 0.003557   | 7.04x   | -85.79%        |
| 512x512   | float32 | 4        | 1.00    | subtract      | sse42  | 0.025036 | 0.000811   | 30.89x  | -96.76%        |
| 512x512   | float32 | 4        | 1.00    | subtract      | avx2   | 0.025036 | 0.000782   | 32.03x  | -96.88%        |
| 512x512   | float32 | 4        | 0.50    | grain_extract | scalar | 0.025654 | 0.004514   | 5.68x   | -82.40%        |
| 512x512   | float32 | 4        | 0.50    | grain_extract | sse42  | 0.025654 | 0.000779   | 32.92x  | -96.96%        |
| 512x512   | float32 | 4        | 0.50    | grain_extract | avx2   | 0.025654 | 0.000831   | 30.89x  | -96.76%        |
| 512x512   | float32 | 4        | 0.00    | grain_extract | scalar | 0.025531 | 0.000549   | 46.49x  | -97.85%        |
| 512x512   | float32 | 4        | 0.00    | grain_extract | sse42  | 0.025531 | 0.000289   | 88.44x  | -98.87%        |
| 512x512   | float32 | 4        | 0.00    | grain_extract | avx2   | 0.025531 | 0.000281   | 90.92x  | -98.90%        |
| 512x512   | float32 | 4        | 1.00    | grain_extract | scalar | 0.025511 | 0.004504   | 5.66x   | -82.35%        |
| 512x512   | float32 | 4        | 1.00    | grain_extract | sse42  | 0.025511 | 0.000737   | 34.60x  | -97.11%        |
| 512x512   | float32 | 4        | 1.00    | grain_extract | avx2   | 0.025511 | 0.000782   | 32.64x  | -96.94%        |
| 512x512   | float32 | 4        | 0.50    | grain_merge   | scalar | 0.025282 | 0.004514   | 5.60x   | -82.15%        |
| 512x512   | float32 | 4        | 0.50    | grain_merge   | sse42  | 0.025282 | 0.000783   | 32.30x  | -96.90%        |
| 512x512   | float32 | 4        | 0.50    | grain_merge   | avx2   | 0.025282 | 0.000801   | 31.57x  | -96.83%        |
| 512x512   | float32 | 4        | 0.00    | grain_merge   | scalar | 0.025842 | 0.000551   | 46.87x  | -97.87%        |
| 512x512   | float32 | 4        | 0.00    | grain_merge   | sse42  | 0.025842 | 0.000282   | 91.52x  | -98.91%        |
| 512x512   | float32 | 4        | 0.00    | grain_merge   | avx2   | 0.025842 | 0.000289   | 89.36x  | -98.88%        |
| 512x512   | float32 | 4        | 1.00    | grain_merge   | scalar | 0.025590 | 0.004519   | 5.66x   | -82.34%        |
| 512x512   | float32 | 4        | 1.00    | grain_merge   | sse42  | 0.025590 | 0.000764   | 33.50x  | -97.01%        |
| 512x512   | float32 | 4        | 1.00    | grain_merge   | avx2   | 0.025590 | 0.000792   | 32.31x  | -96.91%        |
| 512x512   | float32 | 4        | 0.50    | divide        | scalar | 0.026216 | 0.004051   | 6.47x   | -84.55%        |
| 512x512   | float32 | 4        | 0.50    | divide        | sse42  | 0.026216 | 0.000758   | 34.60x  | -97.11%        |
| 512x512   | float32 | 4        | 0.50    | divide        | avx2   | 0.026216 | 0.000771   | 34.02x  | -97.06%        |
| 512x512   | float32 | 4        | 0.00    | divide        | scalar | 0.026078 | 0.000518   | 50.32x  | -98.01%        |
| 512x512   | float32 | 4        | 0.00    | divide        | sse42  | 0.026078 | 0.000248   | 105.13x | -99.05%        |
| 512x512   | float32 | 4        | 0.00    | divide        | avx2   | 0.026078 | 0.000270   | 96.60x  | -98.96%        |
| 512x512   | float32 | 4        | 1.00    | divide        | scalar | 0.025651 | 0.003166   | 8.10x   | -87.66%        |
| 512x512   | float32 | 4        | 1.00    | divide        | sse42  | 0.025651 | 0.000773   | 33.17x  | -96.99%        |
| 512x512   | float32 | 4        | 1.00    | divide        | avx2   | 0.025651 | 0.000764   | 33.60x  | -97.02%        |
| 512x512   | float32 | 4        | 0.50    | overlay       | scalar | 0.032134 | 0.007142   | 4.50x   | -77.77%        |
| 512x512   | float32 | 4        | 0.50    | overlay       | sse42  | 0.032134 | 0.000778   | 41.28x  | -97.58%        |
| 512x512   | float32 | 4        | 0.50    | overlay       | avx2   | 0.032134 | 0.000748   | 42.95x  | -97.67%        |
| 512x512   | float32 | 4        | 0.00    | overlay       | scalar | 0.032136 | 0.000549   | 58.53x  | -98.29%        |
| 512x512   | float32 | 4        | 0.00    | overlay       | sse42  | 0.032136 | 0.000246   | 130.76x | -99.24%        |
| 512x512   | float32 | 4        | 0.00    | overlay       | avx2   | 0.032136 | 0.000258   | 124.41x | -99.20%        |
| 512x512   | float32 | 4        | 1.00    | overlay       | scalar | 0.032218 | 0.007168   | 4.49x   | -77.75%        |
| 512x512   | float32 | 4        | 1.00    | overlay       | sse42  | 0.032218 | 0.000816   | 39.47x  | -97.47%        |
| 512x512   | float32 | 4        | 1.00    | overlay       | avx2   | 0.032218 | 0.000855   | 37.66x  | -97.34%        |
| 1024x1024 | uint8   | 3        | 0.50    | normal        | scalar | 0.094839 | 0.025280   | 3.75x   | -73.34%        |
| 1024x1024 | uint8   | 3        | 0.50    | normal        | sse42  | 0.094839 | 0.010864   | 8.73x   | -88.54%        |
| 1024x1024 | uint8   | 3        | 0.50    | normal        | avx2   | 0.094839 | 0.011009   | 8.61x   | -88.39%        |
| 1024x1024 | uint8   | 3        | 0.50    | soft_light    | scalar | 0.126995 | 0.028117   | 4.52x   | -77.86%        |
| 1024x1024 | uint8   | 3        | 0.50    | soft_light    | sse42  | 0.126995 | 0.013559   | 9.37x   | -89.32%        |
| 1024x1024 | uint8   | 3        | 0.50    | soft_light    | avx2   | 0.126995 | 0.012633   | 10.05x  | -90.05%        |
| 1024x1024 | uint8   | 3        | 0.50    | lighten_only  | scalar | 0.100069 | 0.030163   | 3.32x   | -69.86%        |
| 1024x1024 | uint8   | 3        | 0.50    | lighten_only  | sse42  | 0.100069 | 0.012367   | 8.09x   | -87.64%        |
| 1024x1024 | uint8   | 3        | 0.50    | lighten_only  | avx2   | 0.100069 | 0.011895   | 8.41x   | -88.11%        |
| 1024x1024 | uint8   | 3        | 0.50    | screen        | scalar | 0.104290 | 0.027393   | 3.81x   | -73.73%        |
| 1024x1024 | uint8   | 3        | 0.50    | screen        | sse42  | 0.104290 | 0.012699   | 8.21x   | -87.82%        |
| 1024x1024 | uint8   | 3        | 0.50    | screen        | avx2   | 0.104290 | 0.012159   | 8.58x   | -88.34%        |
| 1024x1024 | uint8   | 3        | 0.50    | dodge         | scalar | 0.104213 | 0.028293   | 3.68x   | -72.85%        |
| 1024x1024 | uint8   | 3        | 0.50    | dodge         | sse42  | 0.104213 | 0.013897   | 7.50x   | -86.66%        |
| 1024x1024 | uint8   | 3        | 0.50    | dodge         | avx2   | 0.104213 | 0.012778   | 8.16x   | -87.74%        |
| 1024x1024 | uint8   | 3        | 0.50    | addition      | scalar | 0.100048 | 0.039617   | 2.53x   | -60.40%        |
| 1024x1024 | uint8   | 3        | 0.50    | addition      | sse42  | 0.100048 | 0.012574   | 7.96x   | -87.43%        |
| 1024x1024 | uint8   | 3        | 0.50    | addition      | avx2   | 0.100048 | 0.012084   | 8.28x   | -87.92%        |
| 1024x1024 | uint8   | 3        | 0.50    | darken_only   | scalar | 0.100404 | 0.030402   | 3.30x   | -69.72%        |
| 1024x1024 | uint8   | 3        | 0.50    | darken_only   | sse42  | 0.100404 | 0.012296   | 8.17x   | -87.75%        |
| 1024x1024 | uint8   | 3        | 0.50    | darken_only   | avx2   | 0.100404 | 0.011932   | 8.41x   | -88.12%        |
| 1024x1024 | uint8   | 3        | 0.50    | multiply      | scalar | 0.102176 | 0.027350   | 3.74x   | -73.23%        |
| 1024x1024 | uint8   | 3        | 0.50    | multiply      | sse42  | 0.102176 | 0.012445   | 8.21x   | -87.82%        |
| 1024x1024 | uint8   | 3        | 0.50    | multiply      | avx2   | 0.102176 | 0.011982   | 8.53x   | -88.27%        |
| 1024x1024 | uint8   | 3        | 0.50    | hard_light    | scalar | 0.134382 | 0.046849   | 2.87x   | -65.14%        |
| 1024x1024 | uint8   | 3        | 0.50    | hard_light    | sse42  | 0.134382 | 0.013894   | 9.67x   | -89.66%        |
| 1024x1024 | uint8   | 3        | 0.50    | hard_light    | avx2   | 0.134382 | 0.012752   | 10.54x  | -90.51%        |
| 1024x1024 | uint8   | 3        | 0.50    | difference    | scalar | 0.129480 | 0.027437   | 4.72x   | -78.81%        |
| 1024x1024 | uint8   | 3        | 0.50    | difference    | sse42  | 0.129480 | 0.012287   | 10.54x  | -90.51%        |
| 1024x1024 | uint8   | 3        | 0.50    | difference    | avx2   | 0.129480 | 0.011906   | 10.88x  | -90.81%        |
| 1024x1024 | uint8   | 3        | 0.50    | subtract      | scalar | 0.100542 | 0.025217   | 3.99x   | -74.92%        |
| 1024x1024 | uint8   | 3        | 0.50    | subtract      | sse42  | 0.100542 | 0.013588   | 7.40x   | -86.49%        |
| 1024x1024 | uint8   | 3        | 0.50    | subtract      | avx2   | 0.100542 | 0.012492   | 8.05x   | -87.57%        |
| 1024x1024 | uint8   | 3        | 0.50    | grain_extract | scalar | 0.102238 | 0.034312   | 2.98x   | -66.44%        |
| 1024x1024 | uint8   | 3        | 0.50    | grain_extract | sse42  | 0.102238 | 0.013706   | 7.46x   | -86.59%        |
| 1024x1024 | uint8   | 3        | 0.50    | grain_extract | avx2   | 0.102238 | 0.012787   | 8.00x   | -87.49%        |
| 1024x1024 | uint8   | 3        | 0.50    | grain_merge   | scalar | 0.104394 | 0.033550   | 3.11x   | -67.86%        |
| 1024x1024 | uint8   | 3        | 0.50    | grain_merge   | sse42  | 0.104394 | 0.013519   | 7.72x   | -87.05%        |
| 1024x1024 | uint8   | 3        | 0.50    | grain_merge   | avx2   | 0.104394 | 0.012724   | 8.20x   | -87.81%        |
| 1024x1024 | uint8   | 3        | 0.50    | divide        | scalar | 0.106128 | 0.028238   | 3.76x   | -73.39%        |
| 1024x1024 | uint8   | 3        | 0.50    | divide        | sse42  | 0.106128 | 0.013875   | 7.65x   | -86.93%        |
| 1024x1024 | uint8   | 3        | 0.50    | divide        | avx2   | 0.106128 | 0.013703   | 7.74x   | -87.09%        |
| 1024x1024 | uint8   | 3        | 0.50    | overlay       | scalar | 0.130377 | 0.045998   | 2.83x   | -64.72%        |
| 1024x1024 | uint8   | 3        | 0.50    | overlay       | sse42  | 0.130377 | 0.014143   | 9.22x   | -89.15%        |
| 1024x1024 | uint8   | 3        | 0.50    | overlay       | avx2   | 0.130377 | 0.012946   | 10.07x  | -90.07%        |
| 1024x1024 | uint8   | 4        | 0.50    | normal        | scalar | 0.070143 | 0.020905   | 3.36x   | -70.20%        |
| 1024x1024 | uint8   | 4        | 0.50    | normal        | sse42  | 0.070143 | 0.002835   | 24.74x  | -95.96%        |
| 1024x1024 | uint8   | 4        | 0.50    | normal        | avx2   | 0.070143 | 0.002521   | 27.83x  | -96.41%        |
| 1024x1024 | uint8   | 4        | 0.50    | soft_light    | scalar | 0.101605 | 0.026332   | 3.86x   | -74.08%        |
| 1024x1024 | uint8   | 4        | 0.50    | soft_light    | sse42  | 0.101605 | 0.003833   | 26.51x  | -96.23%        |
| 1024x1024 | uint8   | 4        | 0.50    | soft_light    | avx2   | 0.101605 | 0.003143   | 32.32x  | -96.91%        |
| 1024x1024 | uint8   | 4        | 0.50    | lighten_only  | scalar | 0.075982 | 0.027670   | 2.75x   | -63.58%        |
| 1024x1024 | uint8   | 4        | 0.50    | lighten_only  | sse42  | 0.075982 | 0.003082   | 24.66x  | -95.94%        |
| 1024x1024 | uint8   | 4        | 0.50    | lighten_only  | avx2   | 0.075982 | 0.003042   | 24.98x  | -96.00%        |
| 1024x1024 | uint8   | 4        | 0.50    | screen        | scalar | 0.079442 | 0.025210   | 3.15x   | -68.27%        |
| 1024x1024 | uint8   | 4        | 0.50    | screen        | sse42  | 0.079442 | 0.003338   | 23.80x  | -95.80%        |
| 1024x1024 | uint8   | 4        | 0.50    | screen        | avx2   | 0.079442 | 0.003180   | 24.98x  | -96.00%        |
| 1024x1024 | uint8   | 4        | 0.50    | dodge         | scalar | 0.079289 | 0.026128   | 3.03x   | -67.05%        |
| 1024x1024 | uint8   | 4        | 0.50    | dodge         | sse42  | 0.079289 | 0.003801   | 20.86x  | -95.21%        |
| 1024x1024 | uint8   | 4        | 0.50    | dodge         | avx2   | 0.079289 | 0.003144   | 25.22x  | -96.03%        |
| 1024x1024 | uint8   | 4        | 0.50    | addition      | scalar | 0.075894 | 0.031374   | 2.42x   | -58.66%        |
| 1024x1024 | uint8   | 4        | 0.50    | addition      | sse42  | 0.075894 | 0.004133   | 18.37x  | -94.55%        |
| 1024x1024 | uint8   | 4        | 0.50    | addition      | avx2   | 0.075894 | 0.003228   | 23.51x  | -95.75%        |
| 1024x1024 | uint8   | 4        | 0.50    | darken_only   | scalar | 0.076462 | 0.027792   | 2.75x   | -63.65%        |
| 1024x1024 | uint8   | 4        | 0.50    | darken_only   | sse42  | 0.076462 | 0.003027   | 25.26x  | -96.04%        |
| 1024x1024 | uint8   | 4        | 0.50    | darken_only   | avx2   | 0.076462 | 0.003049   | 25.08x  | -96.01%        |
| 1024x1024 | uint8   | 4        | 0.50    | multiply      | scalar | 0.076996 | 0.026624   | 2.89x   | -65.42%        |
| 1024x1024 | uint8   | 4        | 0.50    | multiply      | sse42  | 0.076996 | 0.003191   | 24.13x  | -95.86%        |
| 1024x1024 | uint8   | 4        | 0.50    | multiply      | avx2   | 0.076996 | 0.003045   | 25.29x  | -96.05%        |
| 1024x1024 | uint8   | 4        | 0.50    | hard_light    | scalar | 0.111533 | 0.041143   | 2.71x   | -63.11%        |
| 1024x1024 | uint8   | 4        | 0.50    | hard_light    | sse42  | 0.111533 | 0.003836   | 29.08x  | -96.56%        |
| 1024x1024 | uint8   | 4        | 0.50    | hard_light    | avx2   | 0.111533 | 0.003130   | 35.63x  | -97.19%        |
| 1024x1024 | uint8   | 4        | 0.50    | difference    | scalar | 0.107496 | 0.024799   | 4.33x   | -76.93%        |
| 1024x1024 | uint8   | 4        | 0.50    | difference    | sse42  | 0.107496 | 0.003149   | 34.14x  | -97.07%        |
| 1024x1024 | uint8   | 4        | 0.50    | difference    | avx2   | 0.107496 | 0.003047   | 35.28x  | -97.17%        |
| 1024x1024 | uint8   | 4        | 0.50    | subtract      | scalar | 0.076173 | 0.023894   | 3.19x   | -68.63%        |
| 1024x1024 | uint8   | 4        | 0.50    | subtract      | sse42  | 0.076173 | 0.004270   | 17.84x  | -94.39%        |
| 1024x1024 | uint8   | 4        | 0.50    | subtract      | avx2   | 0.076173 | 0.003235   | 23.55x  | -95.75%        |
| 1024x1024 | uint8   | 4        | 0.50    | grain_extract | scalar | 0.078209 | 0.030427   | 2.57x   | -61.09%        |
| 1024x1024 | uint8   | 4        | 0.50    | grain_extract | sse42  | 0.078209 | 0.003401   | 22.99x  | -95.65%        |
| 1024x1024 | uint8   | 4        | 0.50    | grain_extract | avx2   | 0.078209 | 0.003094   | 25.28x  | -96.04%        |
| 1024x1024 | uint8   | 4        | 0.50    | grain_merge   | scalar | 0.078319 | 0.030088   | 2.60x   | -61.58%        |
| 1024x1024 | uint8   | 4        | 0.50    | grain_merge   | sse42  | 0.078319 | 0.003358   | 23.33x  | -95.71%        |
| 1024x1024 | uint8   | 4        | 0.50    | grain_merge   | avx2   | 0.078319 | 0.003071   | 25.50x  | -96.08%        |
| 1024x1024 | uint8   | 4        | 0.50    | divide        | scalar | 0.080021 | 0.025897   | 3.09x   | -67.64%        |
| 1024x1024 | uint8   | 4        | 0.50    | divide        | sse42  | 0.080021 | 0.003639   | 21.99x  | -95.45%        |
| 1024x1024 | uint8   | 4        | 0.50    | divide        | avx2   | 0.080021 | 0.003499   | 22.87x  | -95.63%        |
| 1024x1024 | uint8   | 4        | 0.50    | overlay       | scalar | 0.106051 | 0.040344   | 2.63x   | -61.96%        |
| 1024x1024 | uint8   | 4        | 0.50    | overlay       | sse42  | 0.106051 | 0.003624   | 29.27x  | -96.58%        |
| 1024x1024 | uint8   | 4        | 0.50    | overlay       | avx2   | 0.106051 | 0.003162   | 33.54x  | -97.02%        |
| 1024x1024 | float32 | 3        | 0.50    | normal        | scalar | 0.083505 | 0.007947   | 10.51x  | -90.48%        |
| 1024x1024 | float32 | 3        | 0.50    | normal        | sse42  | 0.083505 | 0.003595   | 23.23x  | -95.69%        |
| 1024x1024 | float32 | 3        | 0.50    | normal        | avx2   | 0.083505 | 0.002396   | 34.85x  | -97.13%        |
| 1024x1024 | float32 | 3        | 0.50    | soft_light    | scalar | 0.116328 | 0.010063   | 11.56x  | -91.35%        |
| 1024x1024 | float32 | 3        | 0.50    | soft_light    | sse42  | 0.116328 | 0.004263   | 27.29x  | -96.34%        |
| 1024x1024 | float32 | 3        | 0.50    | soft_light    | avx2   | 0.116328 | 0.003077   | 37.80x  | -97.35%        |
| 1024x1024 | float32 | 3        | 0.50    | lighten_only  | scalar | 0.098583 | 0.012342   | 7.99x   | -87.48%        |
| 1024x1024 | float32 | 3        | 0.50    | lighten_only  | sse42  | 0.098583 | 0.003588   | 27.48x  | -96.36%        |
| 1024x1024 | float32 | 3        | 0.50    | lighten_only  | avx2   | 0.098583 | 0.003156   | 31.24x  | -96.80%        |
| 1024x1024 | float32 | 3        | 0.50    | screen        | scalar | 0.099742 | 0.009306   | 10.72x  | -90.67%        |
| 1024x1024 | float32 | 3        | 0.50    | screen        | sse42  | 0.099742 | 0.003905   | 25.54x  | -96.08%        |
| 1024x1024 | float32 | 3        | 0.50    | screen        | avx2   | 0.099742 | 0.003066   | 32.53x  | -96.93%        |
| 1024x1024 | float32 | 3        | 0.50    | dodge         | scalar | 0.095159 | 0.010322   | 9.22x   | -89.15%        |
| 1024x1024 | float32 | 3        | 0.50    | dodge         | sse42  | 0.095159 | 0.004425   | 21.50x  | -95.35%        |
| 1024x1024 | float32 | 3        | 0.50    | dodge         | avx2   | 0.095159 | 0.003153   | 30.18x  | -96.69%        |
| 1024x1024 | float32 | 3        | 0.50    | addition      | scalar | 0.090771 | 0.025361   | 3.58x   | -72.06%        |
| 1024x1024 | float32 | 3        | 0.50    | addition      | sse42  | 0.090771 | 0.003725   | 24.37x  | -95.90%        |
| 1024x1024 | float32 | 3        | 0.50    | addition      | avx2   | 0.090771 | 0.003074   | 29.53x  | -96.61%        |
| 1024x1024 | float32 | 3        | 0.50    | darken_only   | scalar | 0.089637 | 0.011961   | 7.49x   | -86.66%        |
| 1024x1024 | float32 | 3        | 0.50    | darken_only   | sse42  | 0.089637 | 0.003513   | 25.51x  | -96.08%        |
| 1024x1024 | float32 | 3        | 0.50    | darken_only   | avx2   | 0.089637 | 0.002934   | 30.55x  | -96.73%        |
| 1024x1024 | float32 | 3        | 0.50    | multiply      | scalar | 0.091740 | 0.008749   | 10.49x  | -90.46%        |
| 1024x1024 | float32 | 3        | 0.50    | multiply      | sse42  | 0.091740 | 0.003538   | 25.93x  | -96.14%        |
| 1024x1024 | float32 | 3        | 0.50    | multiply      | avx2   | 0.091740 | 0.002969   | 30.90x  | -96.76%        |
| 1024x1024 | float32 | 3        | 0.50    | hard_light    | scalar | 0.124994 | 0.028500   | 4.39x   | -77.20%        |
| 1024x1024 | float32 | 3        | 0.50    | hard_light    | sse42  | 0.124994 | 0.004353   | 28.71x  | -96.52%        |
| 1024x1024 | float32 | 3        | 0.50    | hard_light    | avx2   | 0.124994 | 0.003023   | 41.35x  | -97.58%        |
| 1024x1024 | float32 | 3        | 0.50    | difference    | scalar | 0.120875 | 0.008875   | 13.62x  | -92.66%        |
| 1024x1024 | float32 | 3        | 0.50    | difference    | sse42  | 0.120875 | 0.003768   | 32.08x  | -96.88%        |
| 1024x1024 | float32 | 3        | 0.50    | difference    | avx2   | 0.120875 | 0.002989   | 40.44x  | -97.53%        |
| 1024x1024 | float32 | 3        | 0.50    | subtract      | scalar | 0.089327 | 0.011248   | 7.94x   | -87.41%        |
| 1024x1024 | float32 | 3        | 0.50    | subtract      | sse42  | 0.089327 | 0.003877   | 23.04x  | -95.66%        |
| 1024x1024 | float32 | 3        | 0.50    | subtract      | avx2   | 0.089327 | 0.003067   | 29.12x  | -96.57%        |
| 1024x1024 | float32 | 3        | 0.50    | grain_extract | scalar | 0.092664 | 0.016290   | 5.69x   | -82.42%        |
| 1024x1024 | float32 | 3        | 0.50    | grain_extract | sse42  | 0.092664 | 0.003849   | 24.07x  | -95.85%        |
| 1024x1024 | float32 | 3        | 0.50    | grain_extract | avx2   | 0.092664 | 0.003027   | 30.61x  | -96.73%        |
| 1024x1024 | float32 | 3        | 0.50    | grain_merge   | scalar | 0.092773 | 0.016445   | 5.64x   | -82.27%        |
| 1024x1024 | float32 | 3        | 0.50    | grain_merge   | sse42  | 0.092773 | 0.004002   | 23.18x  | -95.69%        |
| 1024x1024 | float32 | 3        | 0.50    | grain_merge   | avx2   | 0.092773 | 0.003070   | 30.22x  | -96.69%        |
| 1024x1024 | float32 | 3        | 0.50    | divide        | scalar | 0.093690 | 0.009880   | 9.48x   | -89.46%        |
| 1024x1024 | float32 | 3        | 0.50    | divide        | sse42  | 0.093690 | 0.004248   | 22.06x  | -95.47%        |
| 1024x1024 | float32 | 3        | 0.50    | divide        | avx2   | 0.093690 | 0.003007   | 31.15x  | -96.79%        |
| 1024x1024 | float32 | 3        | 0.50    | overlay       | scalar | 0.118617 | 0.026093   | 4.55x   | -78.00%        |
| 1024x1024 | float32 | 3        | 0.50    | overlay       | sse42  | 0.118617 | 0.004260   | 27.85x  | -96.41%        |
| 1024x1024 | float32 | 3        | 0.50    | overlay       | avx2   | 0.118617 | 0.003074   | 38.58x  | -97.41%        |
| 1024x1024 | float32 | 4        | 0.50    | normal        | scalar | 0.063601 | 0.009967   | 6.38x   | -84.33%        |
| 1024x1024 | float32 | 4        | 0.50    | normal        | sse42  | 0.063601 | 0.002920   | 21.78x  | -95.41%        |
| 1024x1024 | float32 | 4        | 0.50    | normal        | avx2   | 0.063601 | 0.004480   | 14.20x  | -92.96%        |
| 1024x1024 | float32 | 4        | 0.50    | soft_light    | scalar | 0.097016 | 0.011639   | 8.34x   | -88.00%        |
| 1024x1024 | float32 | 4        | 0.50    | soft_light    | sse42  | 0.097016 | 0.003163   | 30.67x  | -96.74%        |
| 1024x1024 | float32 | 4        | 0.50    | soft_light    | avx2   | 0.097016 | 0.003113   | 31.16x  | -96.79%        |
| 1024x1024 | float32 | 4        | 0.50    | lighten_only  | scalar | 0.070163 | 0.012308   | 5.70x   | -82.46%        |
| 1024x1024 | float32 | 4        | 0.50    | lighten_only  | sse42  | 0.070163 | 0.002959   | 23.71x  | -95.78%        |
| 1024x1024 | float32 | 4        | 0.50    | lighten_only  | avx2   | 0.070163 | 0.003136   | 22.37x  | -95.53%        |
| 1024x1024 | float32 | 4        | 0.50    | screen        | scalar | 0.073247 | 0.010763   | 6.81x   | -85.31%        |
| 1024x1024 | float32 | 4        | 0.50    | screen        | sse42  | 0.073247 | 0.002961   | 24.74x  | -95.96%        |
| 1024x1024 | float32 | 4        | 0.50    | screen        | avx2   | 0.073247 | 0.003084   | 23.75x  | -95.79%        |
| 1024x1024 | float32 | 4        | 0.50    | dodge         | scalar | 0.073465 | 0.012168   | 6.04x   | -83.44%        |
| 1024x1024 | float32 | 4        | 0.50    | dodge         | sse42  | 0.073465 | 0.003503   | 20.97x  | -95.23%        |
| 1024x1024 | float32 | 4        | 0.50    | dodge         | avx2   | 0.073465 | 0.003065   | 23.97x  | -95.83%        |
| 1024x1024 | float32 | 4        | 0.50    | addition      | scalar | 0.070711 | 0.021524   | 3.29x   | -69.56%        |
| 1024x1024 | float32 | 4        | 0.50    | addition      | sse42  | 0.070711 | 0.003230   | 21.89x  | -95.43%        |
| 1024x1024 | float32 | 4        | 0.50    | addition      | avx2   | 0.070711 | 0.003196   | 22.13x  | -95.48%        |
| 1024x1024 | float32 | 4        | 0.50    | darken_only   | scalar | 0.070094 | 0.012382   | 5.66x   | -82.34%        |
| 1024x1024 | float32 | 4        | 0.50    | darken_only   | sse42  | 0.070094 | 0.003118   | 22.48x  | -95.55%        |
| 1024x1024 | float32 | 4        | 0.50    | darken_only   | avx2   | 0.070094 | 0.003098   | 22.63x  | -95.58%        |
| 1024x1024 | float32 | 4        | 0.50    | multiply      | scalar | 0.071957 | 0.010373   | 6.94x   | -85.59%        |
| 1024x1024 | float32 | 4        | 0.50    | multiply      | sse42  | 0.071957 | 0.003409   | 21.11x  | -95.26%        |
| 1024x1024 | float32 | 4        | 0.50    | multiply      | avx2   | 0.071957 | 0.003166   | 22.73x  | -95.60%        |
| 1024x1024 | float32 | 4        | 0.50    | hard_light    | scalar | 0.105849 | 0.029988   | 3.53x   | -71.67%        |
| 1024x1024 | float32 | 4        | 0.50    | hard_light    | sse42  | 0.105849 | 0.003628   | 29.17x  | -96.57%        |
| 1024x1024 | float32 | 4        | 0.50    | hard_light    | avx2   | 0.105849 | 0.003139   | 33.72x  | -97.03%        |
| 1024x1024 | float32 | 4        | 0.50    | difference    | scalar | 0.101612 | 0.010435   | 9.74x   | -89.73%        |
| 1024x1024 | float32 | 4        | 0.50    | difference    | sse42  | 0.101612 | 0.002947   | 34.48x  | -97.10%        |
| 1024x1024 | float32 | 4        | 0.50    | difference    | avx2   | 0.101612 | 0.003239   | 31.38x  | -96.81%        |
| 1024x1024 | float32 | 4        | 0.50    | subtract      | scalar | 0.070172 | 0.013966   | 5.02x   | -80.10%        |
| 1024x1024 | float32 | 4        | 0.50    | subtract      | sse42  | 0.070172 | 0.003253   | 21.57x  | -95.36%        |
| 1024x1024 | float32 | 4        | 0.50    | subtract      | avx2   | 0.070172 | 0.003146   | 22.30x  | -95.52%        |
| 1024x1024 | float32 | 4        | 0.50    | grain_extract | scalar | 0.073062 | 0.017063   | 4.28x   | -76.65%        |
| 1024x1024 | float32 | 4        | 0.50    | grain_extract | sse42  | 0.073062 | 0.003076   | 23.75x  | -95.79%        |
| 1024x1024 | float32 | 4        | 0.50    | grain_extract | avx2   | 0.073062 | 0.003149   | 23.21x  | -95.69%        |
| 1024x1024 | float32 | 4        | 0.50    | grain_merge   | scalar | 0.072788 | 0.017026   | 4.28x   | -76.61%        |
| 1024x1024 | float32 | 4        | 0.50    | grain_merge   | sse42  | 0.072788 | 0.003053   | 23.85x  | -95.81%        |
| 1024x1024 | float32 | 4        | 0.50    | grain_merge   | avx2   | 0.072788 | 0.003087   | 23.58x  | -95.76%        |
| 1024x1024 | float32 | 4        | 0.50    | divide        | scalar | 0.074411 | 0.011527   | 6.46x   | -84.51%        |
| 1024x1024 | float32 | 4        | 0.50    | divide        | sse42  | 0.074411 | 0.003182   | 23.39x  | -95.72%        |
| 1024x1024 | float32 | 4        | 0.50    | divide        | avx2   | 0.074411 | 0.003106   | 23.96x  | -95.83%        |
| 1024x1024 | float32 | 4        | 0.50    | overlay       | scalar | 0.099707 | 0.027888   | 3.58x   | -72.03%        |
| 1024x1024 | float32 | 4        | 0.50    | overlay       | sse42  | 0.099707 | 0.003340   | 29.85x  | -96.65%        |
| 1024x1024 | float32 | 4        | 0.50    | overlay       | avx2   | 0.099707 | 0.003180   | 31.36x  | -96.81%        |
| 2048x2048 | uint8   | 3        | 0.50    | normal        | scalar | 0.374464 | 0.103645   | 3.61x   | -72.32%        |
| 2048x2048 | uint8   | 3        | 0.50    | normal        | sse42  | 0.374464 | 0.044552   | 8.41x   | -88.10%        |
| 2048x2048 | uint8   | 3        | 0.50    | normal        | avx2   | 0.374464 | 0.045008   | 8.32x   | -87.98%        |
| 2048x2048 | uint8   | 3        | 0.50    | soft_light    | scalar | 0.483461 | 0.115179   | 4.20x   | -76.18%        |
| 2048x2048 | uint8   | 3        | 0.50    | soft_light    | sse42  | 0.483461 | 0.055359   | 8.73x   | -88.55%        |
| 2048x2048 | uint8   | 3        | 0.50    | soft_light    | avx2   | 0.483461 | 0.051581   | 9.37x   | -89.33%        |
| 2048x2048 | uint8   | 3        | 0.50    | lighten_only  | scalar | 0.369076 | 0.123138   | 3.00x   | -66.64%        |
| 2048x2048 | uint8   | 3        | 0.50    | lighten_only  | sse42  | 0.369076 | 0.050094   | 7.37x   | -86.43%        |
| 2048x2048 | uint8   | 3        | 0.50    | lighten_only  | avx2   | 0.369076 | 0.048164   | 7.66x   | -86.95%        |
| 2048x2048 | uint8   | 3        | 0.50    | screen        | scalar | 0.387794 | 0.111794   | 3.47x   | -71.17%        |
| 2048x2048 | uint8   | 3        | 0.50    | screen        | sse42  | 0.387794 | 0.051754   | 7.49x   | -86.65%        |
| 2048x2048 | uint8   | 3        | 0.50    | screen        | avx2   | 0.387794 | 0.049450   | 7.84x   | -87.25%        |
| 2048x2048 | uint8   | 3        | 0.50    | dodge         | scalar | 0.392020 | 0.115456   | 3.40x   | -70.55%        |
| 2048x2048 | uint8   | 3        | 0.50    | dodge         | sse42  | 0.392020 | 0.056414   | 6.95x   | -85.61%        |
| 2048x2048 | uint8   | 3        | 0.50    | dodge         | avx2   | 0.392020 | 0.052008   | 7.54x   | -86.73%        |
| 2048x2048 | uint8   | 3        | 0.50    | addition      | scalar | 0.378307 | 0.160187   | 2.36x   | -57.66%        |
| 2048x2048 | uint8   | 3        | 0.50    | addition      | sse42  | 0.378307 | 0.051064   | 7.41x   | -86.50%        |
| 2048x2048 | uint8   | 3        | 0.50    | addition      | avx2   | 0.378307 | 0.048815   | 7.75x   | -87.10%        |
| 2048x2048 | uint8   | 3        | 0.50    | darken_only   | scalar | 0.366131 | 0.123695   | 2.96x   | -66.22%        |
| 2048x2048 | uint8   | 3        | 0.50    | darken_only   | sse42  | 0.366131 | 0.050151   | 7.30x   | -86.30%        |
| 2048x2048 | uint8   | 3        | 0.50    | darken_only   | avx2   | 0.366131 | 0.048195   | 7.60x   | -86.84%        |
| 2048x2048 | uint8   | 3        | 0.50    | multiply      | scalar | 0.379532 | 0.112395   | 3.38x   | -70.39%        |
| 2048x2048 | uint8   | 3        | 0.50    | multiply      | sse42  | 0.379532 | 0.050151   | 7.57x   | -86.79%        |
| 2048x2048 | uint8   | 3        | 0.50    | multiply      | avx2   | 0.379532 | 0.048644   | 7.80x   | -87.18%        |
| 2048x2048 | uint8   | 3        | 0.50    | hard_light    | scalar | 0.532761 | 0.189780   | 2.81x   | -64.38%        |
| 2048x2048 | uint8   | 3        | 0.50    | hard_light    | sse42  | 0.532761 | 0.056670   | 9.40x   | -89.36%        |
| 2048x2048 | uint8   | 3        | 0.50    | hard_light    | avx2   | 0.532761 | 0.052141   | 10.22x  | -90.21%        |
| 2048x2048 | uint8   | 3        | 0.50    | difference    | scalar | 0.486546 | 0.112062   | 4.34x   | -76.97%        |
| 2048x2048 | uint8   | 3        | 0.50    | difference    | sse42  | 0.486546 | 0.050259   | 9.68x   | -89.67%        |
| 2048x2048 | uint8   | 3        | 0.50    | difference    | avx2   | 0.486546 | 0.048119   | 10.11x  | -90.11%        |
| 2048x2048 | uint8   | 3        | 0.50    | subtract      | scalar | 0.375578 | 0.103654   | 3.62x   | -72.40%        |
| 2048x2048 | uint8   | 3        | 0.50    | subtract      | sse42  | 0.375578 | 0.054861   | 6.85x   | -85.39%        |
| 2048x2048 | uint8   | 3        | 0.50    | subtract      | avx2   | 0.375578 | 0.050844   | 7.39x   | -86.46%        |
| 2048x2048 | uint8   | 3        | 0.50    | grain_extract | scalar | 0.386779 | 0.135699   | 2.85x   | -64.92%        |
| 2048x2048 | uint8   | 3        | 0.50    | grain_extract | sse42  | 0.386779 | 0.055061   | 7.02x   | -85.76%        |
| 2048x2048 | uint8   | 3        | 0.50    | grain_extract | avx2   | 0.386779 | 0.050881   | 7.60x   | -86.84%        |
| 2048x2048 | uint8   | 3        | 0.50    | grain_merge   | scalar | 0.384837 | 0.135041   | 2.85x   | -64.91%        |
| 2048x2048 | uint8   | 3        | 0.50    | grain_merge   | sse42  | 0.384837 | 0.054198   | 7.10x   | -85.92%        |
| 2048x2048 | uint8   | 3        | 0.50    | grain_merge   | avx2   | 0.384837 | 0.050985   | 7.55x   | -86.75%        |
| 2048x2048 | uint8   | 3        | 0.50    | divide        | scalar | 0.393058 | 0.112230   | 3.50x   | -71.45%        |
| 2048x2048 | uint8   | 3        | 0.50    | divide        | sse42  | 0.393058 | 0.054405   | 7.22x   | -86.16%        |
| 2048x2048 | uint8   | 3        | 0.50    | divide        | avx2   | 0.393058 | 0.050570   | 7.77x   | -87.13%        |
| 2048x2048 | uint8   | 3        | 0.50    | overlay       | scalar | 0.490372 | 0.183278   | 2.68x   | -62.62%        |
| 2048x2048 | uint8   | 3        | 0.50    | overlay       | sse42  | 0.490372 | 0.054802   | 8.95x   | -88.82%        |
| 2048x2048 | uint8   | 3        | 0.50    | overlay       | avx2   | 0.490372 | 0.051032   | 9.61x   | -89.59%        |
| 2048x2048 | uint8   | 4        | 0.50    | normal        | scalar | 0.270092 | 0.083275   | 3.24x   | -69.17%        |
| 2048x2048 | uint8   | 4        | 0.50    | normal        | sse42  | 0.270092 | 0.011070   | 24.40x  | -95.90%        |
| 2048x2048 | uint8   | 4        | 0.50    | normal        | avx2   | 0.270092 | 0.010054   | 26.86x  | -96.28%        |
| 2048x2048 | uint8   | 4        | 0.50    | soft_light    | scalar | 0.379618 | 0.103523   | 3.67x   | -72.73%        |
| 2048x2048 | uint8   | 4        | 0.50    | soft_light    | sse42  | 0.379618 | 0.014022   | 27.07x  | -96.31%        |
| 2048x2048 | uint8   | 4        | 0.50    | soft_light    | avx2   | 0.379618 | 0.012421   | 30.56x  | -96.73%        |
| 2048x2048 | uint8   | 4        | 0.50    | lighten_only  | scalar | 0.264956 | 0.109360   | 2.42x   | -58.73%        |
| 2048x2048 | uint8   | 4        | 0.50    | lighten_only  | sse42  | 0.264956 | 0.011998   | 22.08x  | -95.47%        |
| 2048x2048 | uint8   | 4        | 0.50    | lighten_only  | avx2   | 0.264956 | 0.011967   | 22.14x  | -95.48%        |
| 2048x2048 | uint8   | 4        | 0.50    | screen        | scalar | 0.284002 | 0.098384   | 2.89x   | -65.36%        |
| 2048x2048 | uint8   | 4        | 0.50    | screen        | sse42  | 0.284002 | 0.013233   | 21.46x  | -95.34%        |
| 2048x2048 | uint8   | 4        | 0.50    | screen        | avx2   | 0.284002 | 0.012438   | 22.83x  | -95.62%        |
| 2048x2048 | uint8   | 4        | 0.50    | dodge         | scalar | 0.286230 | 0.102536   | 2.79x   | -64.18%        |
| 2048x2048 | uint8   | 4        | 0.50    | dodge         | sse42  | 0.286230 | 0.014821   | 19.31x  | -94.82%        |
| 2048x2048 | uint8   | 4        | 0.50    | dodge         | avx2   | 0.286230 | 0.012204   | 23.45x  | -95.74%        |
| 2048x2048 | uint8   | 4        | 0.50    | addition      | scalar | 0.273076 | 0.123245   | 2.22x   | -54.87%        |
| 2048x2048 | uint8   | 4        | 0.50    | addition      | sse42  | 0.273076 | 0.016178   | 16.88x  | -94.08%        |
| 2048x2048 | uint8   | 4        | 0.50    | addition      | avx2   | 0.273076 | 0.012549   | 21.76x  | -95.40%        |
| 2048x2048 | uint8   | 4        | 0.50    | darken_only   | scalar | 0.261902 | 0.109916   | 2.38x   | -58.03%        |
| 2048x2048 | uint8   | 4        | 0.50    | darken_only   | sse42  | 0.261902 | 0.012068   | 21.70x  | -95.39%        |
| 2048x2048 | uint8   | 4        | 0.50    | darken_only   | avx2   | 0.261902 | 0.012016   | 21.80x  | -95.41%        |
| 2048x2048 | uint8   | 4        | 0.50    | multiply      | scalar | 0.272909 | 0.099279   | 2.75x   | -63.62%        |
| 2048x2048 | uint8   | 4        | 0.50    | multiply      | sse42  | 0.272909 | 0.012472   | 21.88x  | -95.43%        |
| 2048x2048 | uint8   | 4        | 0.50    | multiply      | avx2   | 0.272909 | 0.011845   | 23.04x  | -95.66%        |
| 2048x2048 | uint8   | 4        | 0.50    | hard_light    | scalar | 0.428528 | 0.164495   | 2.61x   | -61.61%        |
| 2048x2048 | uint8   | 4        | 0.50    | hard_light    | sse42  | 0.428528 | 0.015101   | 28.38x  | -96.48%        |
| 2048x2048 | uint8   | 4        | 0.50    | hard_light    | avx2   | 0.428528 | 0.012414   | 34.52x  | -97.10%        |
| 2048x2048 | uint8   | 4        | 0.50    | difference    | scalar | 0.384338 | 0.098096   | 3.92x   | -74.48%        |
| 2048x2048 | uint8   | 4        | 0.50    | difference    | sse42  | 0.384338 | 0.012252   | 31.37x  | -96.81%        |
| 2048x2048 | uint8   | 4        | 0.50    | difference    | avx2   | 0.384338 | 0.012060   | 31.87x  | -96.86%        |
| 2048x2048 | uint8   | 4        | 0.50    | subtract      | scalar | 0.272340 | 0.094237   | 2.89x   | -65.40%        |
| 2048x2048 | uint8   | 4        | 0.50    | subtract      | sse42  | 0.272340 | 0.016731   | 16.28x  | -93.86%        |
| 2048x2048 | uint8   | 4        | 0.50    | subtract      | avx2   | 0.272340 | 0.012587   | 21.64x  | -95.38%        |
| 2048x2048 | uint8   | 4        | 0.50    | grain_extract | scalar | 0.283471 | 0.120733   | 2.35x   | -57.41%        |
| 2048x2048 | uint8   | 4        | 0.50    | grain_extract | sse42  | 0.283471 | 0.014587   | 19.43x  | -94.85%        |
| 2048x2048 | uint8   | 4        | 0.50    | grain_extract | avx2   | 0.283471 | 0.012207   | 23.22x  | -95.69%        |
| 2048x2048 | uint8   | 4        | 0.50    | grain_merge   | scalar | 0.281448 | 0.119447   | 2.36x   | -57.56%        |
| 2048x2048 | uint8   | 4        | 0.50    | grain_merge   | sse42  | 0.281448 | 0.013379   | 21.04x  | -95.25%        |
| 2048x2048 | uint8   | 4        | 0.50    | grain_merge   | avx2   | 0.281448 | 0.012191   | 23.09x  | -95.67%        |
| 2048x2048 | uint8   | 4        | 0.50    | divide        | scalar | 0.289081 | 0.100308   | 2.88x   | -65.30%        |
| 2048x2048 | uint8   | 4        | 0.50    | divide        | sse42  | 0.289081 | 0.013670   | 21.15x  | -95.27%        |
| 2048x2048 | uint8   | 4        | 0.50    | divide        | avx2   | 0.289081 | 0.012018   | 24.05x  | -95.84%        |
| 2048x2048 | uint8   | 4        | 0.50    | overlay       | scalar | 0.391074 | 0.158915   | 2.46x   | -59.36%        |
| 2048x2048 | uint8   | 4        | 0.50    | overlay       | sse42  | 0.391074 | 0.014163   | 27.61x  | -96.38%        |
| 2048x2048 | uint8   | 4        | 0.50    | overlay       | avx2   | 0.391074 | 0.012246   | 31.93x  | -96.87%        |
| 2048x2048 | float32 | 3        | 0.50    | normal        | scalar | 0.321316 | 0.036645   | 8.77x   | -88.60%        |
| 2048x2048 | float32 | 3        | 0.50    | normal        | sse42  | 0.321316 | 0.019276   | 16.67x  | -94.00%        |
| 2048x2048 | float32 | 3        | 0.50    | normal        | avx2   | 0.321316 | 0.014435   | 22.26x  | -95.51%        |
| 2048x2048 | float32 | 3        | 0.50    | soft_light    | scalar | 0.429202 | 0.044338   | 9.68x   | -89.67%        |
| 2048x2048 | float32 | 3        | 0.50    | soft_light    | sse42  | 0.429202 | 0.021857   | 19.64x  | -94.91%        |
| 2048x2048 | float32 | 3        | 0.50    | soft_light    | avx2   | 0.429202 | 0.017080   | 25.13x  | -96.02%        |
| 2048x2048 | float32 | 3        | 0.50    | lighten_only  | scalar | 0.314519 | 0.052067   | 6.04x   | -83.45%        |
| 2048x2048 | float32 | 3        | 0.50    | lighten_only  | sse42  | 0.314519 | 0.018553   | 16.95x  | -94.10%        |
| 2048x2048 | float32 | 3        | 0.50    | lighten_only  | avx2   | 0.314519 | 0.016470   | 19.10x  | -94.76%        |
| 2048x2048 | float32 | 3        | 0.50    | screen        | scalar | 0.336441 | 0.040346   | 8.34x   | -88.01%        |
| 2048x2048 | float32 | 3        | 0.50    | screen        | sse42  | 0.336441 | 0.020338   | 16.54x  | -93.95%        |
| 2048x2048 | float32 | 3        | 0.50    | screen        | avx2   | 0.336441 | 0.016899   | 19.91x  | -94.98%        |
| 2048x2048 | float32 | 3        | 0.50    | dodge         | scalar | 0.336910 | 0.045444   | 7.41x   | -86.51%        |
| 2048x2048 | float32 | 3        | 0.50    | dodge         | sse42  | 0.336910 | 0.022459   | 15.00x  | -93.33%        |
| 2048x2048 | float32 | 3        | 0.50    | dodge         | avx2   | 0.336910 | 0.016955   | 19.87x  | -94.97%        |
| 2048x2048 | float32 | 3        | 0.50    | addition      | scalar | 0.325902 | 0.104359   | 3.12x   | -67.98%        |
| 2048x2048 | float32 | 3        | 0.50    | addition      | sse42  | 0.325902 | 0.020338   | 16.02x  | -93.76%        |
| 2048x2048 | float32 | 3        | 0.50    | addition      | avx2   | 0.325902 | 0.017048   | 19.12x  | -94.77%        |
| 2048x2048 | float32 | 3        | 0.50    | darken_only   | scalar | 0.315598 | 0.051633   | 6.11x   | -83.64%        |
| 2048x2048 | float32 | 3        | 0.50    | darken_only   | sse42  | 0.315598 | 0.018989   | 16.62x  | -93.98%        |
| 2048x2048 | float32 | 3        | 0.50    | darken_only   | avx2   | 0.315598 | 0.016517   | 19.11x  | -94.77%        |
| 2048x2048 | float32 | 3        | 0.50    | multiply      | scalar | 0.326323 | 0.039715   | 8.22x   | -87.83%        |
| 2048x2048 | float32 | 3        | 0.50    | multiply      | sse42  | 0.326323 | 0.018845   | 17.32x  | -94.23%        |
| 2048x2048 | float32 | 3        | 0.50    | multiply      | avx2   | 0.326323 | 0.016419   | 19.88x  | -94.97%        |
| 2048x2048 | float32 | 3        | 0.50    | hard_light    | scalar | 0.479711 | 0.118394   | 4.05x   | -75.32%        |
| 2048x2048 | float32 | 3        | 0.50    | hard_light    | sse42  | 0.479711 | 0.022660   | 21.17x  | -95.28%        |
| 2048x2048 | float32 | 3        | 0.50    | hard_light    | avx2   | 0.479711 | 0.017210   | 27.87x  | -96.41%        |
| 2048x2048 | float32 | 3        | 0.50    | difference    | scalar | 0.448123 | 0.040332   | 11.11x  | -91.00%        |
| 2048x2048 | float32 | 3        | 0.50    | difference    | sse42  | 0.448123 | 0.020253   | 22.13x  | -95.48%        |
| 2048x2048 | float32 | 3        | 0.50    | difference    | avx2   | 0.448123 | 0.017314   | 25.88x  | -96.14%        |
| 2048x2048 | float32 | 3        | 0.50    | subtract      | scalar | 0.330480 | 0.049264   | 6.71x   | -85.09%        |
| 2048x2048 | float32 | 3        | 0.50    | subtract      | sse42  | 0.330480 | 0.020582   | 16.06x  | -93.77%        |
| 2048x2048 | float32 | 3        | 0.50    | subtract      | avx2   | 0.330480 | 0.017168   | 19.25x  | -94.81%        |
| 2048x2048 | float32 | 3        | 0.50    | grain_extract | scalar | 0.337144 | 0.069314   | 4.86x   | -79.44%        |
| 2048x2048 | float32 | 3        | 0.50    | grain_extract | sse42  | 0.337144 | 0.020634   | 16.34x  | -93.88%        |
| 2048x2048 | float32 | 3        | 0.50    | grain_extract | avx2   | 0.337144 | 0.017011   | 19.82x  | -94.95%        |
| 2048x2048 | float32 | 3        | 0.50    | grain_merge   | scalar | 0.334497 | 0.069459   | 4.82x   | -79.23%        |
| 2048x2048 | float32 | 3        | 0.50    | grain_merge   | sse42  | 0.334497 | 0.020719   | 16.14x  | -93.81%        |
| 2048x2048 | float32 | 3        | 0.50    | grain_merge   | avx2   | 0.334497 | 0.016864   | 19.84x  | -94.96%        |
| 2048x2048 | float32 | 3        | 0.50    | divide        | scalar | 0.340278 | 0.044082   | 7.72x   | -87.05%        |
| 2048x2048 | float32 | 3        | 0.50    | divide        | sse42  | 0.340278 | 0.021898   | 15.54x  | -93.56%        |
| 2048x2048 | float32 | 3        | 0.50    | divide        | avx2   | 0.340278 | 0.016871   | 20.17x  | -95.04%        |
| 2048x2048 | float32 | 3        | 0.50    | overlay       | scalar | 0.444791 | 0.111071   | 4.00x   | -75.03%        |
| 2048x2048 | float32 | 3        | 0.50    | overlay       | sse42  | 0.444791 | 0.021504   | 20.68x  | -95.17%        |
| 2048x2048 | float32 | 3        | 0.50    | overlay       | avx2   | 0.444791 | 0.016844   | 26.41x  | -96.21%        |
| 2048x2048 | float32 | 4        | 0.50    | normal        | scalar | 0.251054 | 0.045493   | 5.52x   | -81.88%        |
| 2048x2048 | float32 | 4        | 0.50    | normal        | sse42  | 0.251054 | 0.015821   | 15.87x  | -93.70%        |
| 2048x2048 | float32 | 4        | 0.50    | normal        | avx2   | 0.251054 | 0.019886   | 12.62x  | -92.08%        |
| 2048x2048 | float32 | 4        | 0.50    | soft_light    | scalar | 0.359495 | 0.052759   | 6.81x   | -85.32%        |
| 2048x2048 | float32 | 4        | 0.50    | soft_light    | sse42  | 0.359495 | 0.018929   | 18.99x  | -94.73%        |
| 2048x2048 | float32 | 4        | 0.50    | soft_light    | avx2   | 0.359495 | 0.018609   | 19.32x  | -94.82%        |
| 2048x2048 | float32 | 4        | 0.50    | lighten_only  | scalar | 0.244256 | 0.056936   | 4.29x   | -76.69%        |
| 2048x2048 | float32 | 4        | 0.50    | lighten_only  | sse42  | 0.244256 | 0.017613   | 13.87x  | -92.79%        |
| 2048x2048 | float32 | 4        | 0.50    | lighten_only  | avx2   | 0.244256 | 0.017948   | 13.61x  | -92.65%        |
| 2048x2048 | float32 | 4        | 0.50    | screen        | scalar | 0.265482 | 0.049522   | 5.36x   | -81.35%        |
| 2048x2048 | float32 | 4        | 0.50    | screen        | sse42  | 0.265482 | 0.017429   | 15.23x  | -93.44%        |
| 2048x2048 | float32 | 4        | 0.50    | screen        | avx2   | 0.265482 | 0.017973   | 14.77x  | -93.23%        |
| 2048x2048 | float32 | 4        | 0.50    | dodge         | scalar | 0.266317 | 0.054908   | 4.85x   | -79.38%        |
| 2048x2048 | float32 | 4        | 0.50    | dodge         | sse42  | 0.266317 | 0.020365   | 13.08x  | -92.35%        |
| 2048x2048 | float32 | 4        | 0.50    | dodge         | avx2   | 0.266317 | 0.018294   | 14.56x  | -93.13%        |
| 2048x2048 | float32 | 4        | 0.50    | addition      | scalar | 0.253254 | 0.093490   | 2.71x   | -63.08%        |
| 2048x2048 | float32 | 4        | 0.50    | addition      | sse42  | 0.253254 | 0.018933   | 13.38x  | -92.52%        |
| 2048x2048 | float32 | 4        | 0.50    | addition      | avx2   | 0.253254 | 0.018850   | 13.43x  | -92.56%        |
| 2048x2048 | float32 | 4        | 0.50    | darken_only   | scalar | 0.243009 | 0.056450   | 4.30x   | -76.77%        |
| 2048x2048 | float32 | 4        | 0.50    | darken_only   | sse42  | 0.243009 | 0.017792   | 13.66x  | -92.68%        |
| 2048x2048 | float32 | 4        | 0.50    | darken_only   | avx2   | 0.243009 | 0.018091   | 13.43x  | -92.56%        |
| 2048x2048 | float32 | 4        | 0.50    | multiply      | scalar | 0.252708 | 0.047275   | 5.35x   | -81.29%        |
| 2048x2048 | float32 | 4        | 0.50    | multiply      | sse42  | 0.252708 | 0.018295   | 13.81x  | -92.76%        |
| 2048x2048 | float32 | 4        | 0.50    | multiply      | avx2   | 0.252708 | 0.017982   | 14.05x  | -92.88%        |
| 2048x2048 | float32 | 4        | 0.50    | hard_light    | scalar | 0.406807 | 0.127620   | 3.19x   | -68.63%        |
| 2048x2048 | float32 | 4        | 0.50    | hard_light    | sse42  | 0.406807 | 0.020327   | 20.01x  | -95.00%        |
| 2048x2048 | float32 | 4        | 0.50    | hard_light    | avx2   | 0.406807 | 0.018844   | 21.59x  | -95.37%        |
| 2048x2048 | float32 | 4        | 0.50    | difference    | scalar | 0.360045 | 0.048024   | 7.50x   | -86.66%        |
| 2048x2048 | float32 | 4        | 0.50    | difference    | sse42  | 0.360045 | 0.018832   | 19.12x  | -94.77%        |
| 2048x2048 | float32 | 4        | 0.50    | difference    | avx2   | 0.360045 | 0.018371   | 19.60x  | -94.90%        |
| 2048x2048 | float32 | 4        | 0.50    | subtract      | scalar | 0.253724 | 0.063271   | 4.01x   | -75.06%        |
| 2048x2048 | float32 | 4        | 0.50    | subtract      | sse42  | 0.253724 | 0.019295   | 13.15x  | -92.40%        |
| 2048x2048 | float32 | 4        | 0.50    | subtract      | avx2   | 0.253724 | 0.018646   | 13.61x  | -92.65%        |
| 2048x2048 | float32 | 4        | 0.50    | grain_extract | scalar | 0.261850 | 0.074524   | 3.51x   | -71.54%        |
| 2048x2048 | float32 | 4        | 0.50    | grain_extract | sse42  | 0.261850 | 0.018604   | 14.07x  | -92.90%        |
| 2048x2048 | float32 | 4        | 0.50    | grain_extract | avx2   | 0.261850 | 0.018078   | 14.48x  | -93.10%        |
| 2048x2048 | float32 | 4        | 0.50    | grain_merge   | scalar | 0.260619 | 0.074579   | 3.49x   | -71.38%        |
| 2048x2048 | float32 | 4        | 0.50    | grain_merge   | sse42  | 0.260619 | 0.018503   | 14.09x  | -92.90%        |
| 2048x2048 | float32 | 4        | 0.50    | grain_merge   | avx2   | 0.260619 | 0.018099   | 14.40x  | -93.06%        |
| 2048x2048 | float32 | 4        | 0.50    | divide        | scalar | 0.267110 | 0.051379   | 5.20x   | -80.76%        |
| 2048x2048 | float32 | 4        | 0.50    | divide        | sse42  | 0.267110 | 0.018851   | 14.17x  | -92.94%        |
| 2048x2048 | float32 | 4        | 0.50    | divide        | avx2   | 0.267110 | 0.018110   | 14.75x  | -93.22%        |
| 2048x2048 | float32 | 4        | 0.50    | overlay       | scalar | 0.371593 | 0.118348   | 3.14x   | -68.15%        |
| 2048x2048 | float32 | 4        | 0.50    | overlay       | sse42  | 0.371593 | 0.019552   | 19.01x  | -94.74%        |
| 2048x2048 | float32 | 4        | 0.50    | overlay       | avx2   | 0.371593 | 0.018667   | 19.91x  | -94.98%        |
| 1280x720  | uint8   | 3        | 0.50    | normal        | scalar | 0.080071 | 0.022203   | 3.61x   | -72.27%        |
| 1280x720  | uint8   | 3        | 0.50    | normal        | sse42  | 0.080071 | 0.009594   | 8.35x   | -88.02%        |
| 1280x720  | uint8   | 3        | 0.50    | normal        | avx2   | 0.080071 | 0.009693   | 8.26x   | -87.89%        |
| 1280x720  | uint8   | 3        | 0.50    | soft_light    | scalar | 0.107784 | 0.024711   | 4.36x   | -77.07%        |
| 1280x720  | uint8   | 3        | 0.50    | soft_light    | sse42  | 0.107784 | 0.011889   | 9.07x   | -88.97%        |
| 1280x720  | uint8   | 3        | 0.50    | soft_light    | avx2   | 0.107784 | 0.011171   | 9.65x   | -89.64%        |
| 1280x720  | uint8   | 3        | 0.50    | lighten_only  | scalar | 0.084644 | 0.026467   | 3.20x   | -68.73%        |
| 1280x720  | uint8   | 3        | 0.50    | lighten_only  | sse42  | 0.084644 | 0.010815   | 7.83x   | -87.22%        |
| 1280x720  | uint8   | 3        | 0.50    | lighten_only  | avx2   | 0.084644 | 0.010427   | 8.12x   | -87.68%        |
| 1280x720  | uint8   | 3        | 0.50    | screen        | scalar | 0.087560 | 0.024071   | 3.64x   | -72.51%        |
| 1280x720  | uint8   | 3        | 0.50    | screen        | sse42  | 0.087560 | 0.011115   | 7.88x   | -87.31%        |
| 1280x720  | uint8   | 3        | 0.50    | screen        | avx2   | 0.087560 | 0.010817   | 8.09x   | -87.65%        |
| 1280x720  | uint8   | 3        | 0.50    | dodge         | scalar | 0.087554 | 0.025050   | 3.50x   | -71.39%        |
| 1280x720  | uint8   | 3        | 0.50    | dodge         | sse42  | 0.087554 | 0.012218   | 7.17x   | -86.05%        |
| 1280x720  | uint8   | 3        | 0.50    | dodge         | avx2   | 0.087554 | 0.011217   | 7.81x   | -87.19%        |
| 1280x720  | uint8   | 3        | 0.50    | addition      | scalar | 0.082146 | 0.035215   | 2.33x   | -57.13%        |
| 1280x720  | uint8   | 3        | 0.50    | addition      | sse42  | 0.082146 | 0.011030   | 7.45x   | -86.57%        |
| 1280x720  | uint8   | 3        | 0.50    | addition      | avx2   | 0.082146 | 0.010510   | 7.82x   | -87.21%        |
| 1280x720  | uint8   | 3        | 0.50    | darken_only   | scalar | 0.085138 | 0.026807   | 3.18x   | -68.51%        |
| 1280x720  | uint8   | 3        | 0.50    | darken_only   | sse42  | 0.085138 | 0.010804   | 7.88x   | -87.31%        |
| 1280x720  | uint8   | 3        | 0.50    | darken_only   | avx2   | 0.085138 | 0.010434   | 8.16x   | -87.74%        |
| 1280x720  | uint8   | 3        | 0.50    | multiply      | scalar | 0.085644 | 0.023979   | 3.57x   | -72.00%        |
| 1280x720  | uint8   | 3        | 0.50    | multiply      | sse42  | 0.085644 | 0.010806   | 7.93x   | -87.38%        |
| 1280x720  | uint8   | 3        | 0.50    | multiply      | avx2   | 0.085644 | 0.010675   | 8.02x   | -87.54%        |
| 1280x720  | uint8   | 3        | 0.50    | hard_light    | scalar | 0.114641 | 0.041178   | 2.78x   | -64.08%        |
| 1280x720  | uint8   | 3        | 0.50    | hard_light    | sse42  | 0.114641 | 0.012265   | 9.35x   | -89.30%        |
| 1280x720  | uint8   | 3        | 0.50    | hard_light    | avx2   | 0.114641 | 0.011194   | 10.24x  | -90.24%        |
| 1280x720  | uint8   | 3        | 0.50    | difference    | scalar | 0.112069 | 0.024398   | 4.59x   | -78.23%        |
| 1280x720  | uint8   | 3        | 0.50    | difference    | sse42  | 0.112069 | 0.010812   | 10.36x  | -90.35%        |
| 1280x720  | uint8   | 3        | 0.50    | difference    | avx2   | 0.112069 | 0.010580   | 10.59x  | -90.56%        |
| 1280x720  | uint8   | 3        | 0.50    | subtract      | scalar | 0.081920 | 0.022187   | 3.69x   | -72.92%        |
| 1280x720  | uint8   | 3        | 0.50    | subtract      | sse42  | 0.081920 | 0.011849   | 6.91x   | -85.54%        |
| 1280x720  | uint8   | 3        | 0.50    | subtract      | avx2   | 0.081920 | 0.011008   | 7.44x   | -86.56%        |
| 1280x720  | uint8   | 3        | 0.50    | grain_extract | scalar | 0.086326 | 0.029630   | 2.91x   | -65.68%        |
| 1280x720  | uint8   | 3        | 0.50    | grain_extract | sse42  | 0.086326 | 0.011737   | 7.35x   | -86.40%        |
| 1280x720  | uint8   | 3        | 0.50    | grain_extract | avx2   | 0.086326 | 0.010989   | 7.86x   | -87.27%        |
| 1280x720  | uint8   | 3        | 0.50    | grain_merge   | scalar | 0.087316 | 0.029334   | 2.98x   | -66.40%        |
| 1280x720  | uint8   | 3        | 0.50    | grain_merge   | sse42  | 0.087316 | 0.011820   | 7.39x   | -86.46%        |
| 1280x720  | uint8   | 3        | 0.50    | grain_merge   | avx2   | 0.087316 | 0.010981   | 7.95x   | -87.42%        |
| 1280x720  | uint8   | 3        | 0.50    | divide        | scalar | 0.088661 | 0.024478   | 3.62x   | -72.39%        |
| 1280x720  | uint8   | 3        | 0.50    | divide        | sse42  | 0.088661 | 0.011979   | 7.40x   | -86.49%        |
| 1280x720  | uint8   | 3        | 0.50    | divide        | avx2   | 0.088661 | 0.011273   | 7.86x   | -87.28%        |
| 1280x720  | uint8   | 3        | 0.50    | overlay       | scalar | 0.108731 | 0.040331   | 2.70x   | -62.91%        |
| 1280x720  | uint8   | 3        | 0.50    | overlay       | sse42  | 0.108731 | 0.012291   | 8.85x   | -88.70%        |
| 1280x720  | uint8   | 3        | 0.50    | overlay       | avx2   | 0.108731 | 0.011166   | 9.74x   | -89.73%        |
| 1280x720  | uint8   | 4        | 0.50    | normal        | scalar | 0.060620 | 0.018112   | 3.35x   | -70.12%        |
| 1280x720  | uint8   | 4        | 0.50    | normal        | sse42  | 0.060620 | 0.002405   | 25.20x  | -96.03%        |
| 1280x720  | uint8   | 4        | 0.50    | normal        | avx2   | 0.060620 | 0.002196   | 27.60x  | -96.38%        |
| 1280x720  | uint8   | 4        | 0.50    | soft_light    | scalar | 0.094735 | 0.022692   | 4.17x   | -76.05%        |
| 1280x720  | uint8   | 4        | 0.50    | soft_light    | sse42  | 0.094735 | 0.003046   | 31.10x  | -96.78%        |
| 1280x720  | uint8   | 4        | 0.50    | soft_light    | avx2   | 0.094735 | 0.002724   | 34.78x  | -97.12%        |
| 1280x720  | uint8   | 4        | 0.50    | lighten_only  | scalar | 0.071775 | 0.024017   | 2.99x   | -66.54%        |
| 1280x720  | uint8   | 4        | 0.50    | lighten_only  | sse42  | 0.071775 | 0.002705   | 26.53x  | -96.23%        |
| 1280x720  | uint8   | 4        | 0.50    | lighten_only  | avx2   | 0.071775 | 0.002658   | 27.00x  | -96.30%        |
| 1280x720  | uint8   | 4        | 0.50    | screen        | scalar | 0.074851 | 0.021669   | 3.45x   | -71.05%        |
| 1280x720  | uint8   | 4        | 0.50    | screen        | sse42  | 0.074851 | 0.002902   | 25.79x  | -96.12%        |
| 1280x720  | uint8   | 4        | 0.50    | screen        | avx2   | 0.074851 | 0.002718   | 27.54x  | -96.37%        |
| 1280x720  | uint8   | 4        | 0.50    | dodge         | scalar | 0.074690 | 0.022718   | 3.29x   | -69.58%        |
| 1280x720  | uint8   | 4        | 0.50    | dodge         | sse42  | 0.074690 | 0.003259   | 22.92x  | -95.64%        |
| 1280x720  | uint8   | 4        | 0.50    | dodge         | avx2   | 0.074690 | 0.002682   | 27.85x  | -96.41%        |
| 1280x720  | uint8   | 4        | 0.50    | addition      | scalar | 0.070497 | 0.027039   | 2.61x   | -61.64%        |
| 1280x720  | uint8   | 4        | 0.50    | addition      | sse42  | 0.070497 | 0.003533   | 19.95x  | -94.99%        |
| 1280x720  | uint8   | 4        | 0.50    | addition      | avx2   | 0.070497 | 0.002804   | 25.15x  | -96.02%        |
| 1280x720  | uint8   | 4        | 0.50    | darken_only   | scalar | 0.071545 | 0.024190   | 2.96x   | -66.19%        |
| 1280x720  | uint8   | 4        | 0.50    | darken_only   | sse42  | 0.071545 | 0.002647   | 27.03x  | -96.30%        |
| 1280x720  | uint8   | 4        | 0.50    | darken_only   | avx2   | 0.071545 | 0.002648   | 27.02x  | -96.30%        |
| 1280x720  | uint8   | 4        | 0.50    | multiply      | scalar | 0.072742 | 0.021803   | 3.34x   | -70.03%        |
| 1280x720  | uint8   | 4        | 0.50    | multiply      | sse42  | 0.072742 | 0.002736   | 26.58x  | -96.24%        |
| 1280x720  | uint8   | 4        | 0.50    | multiply      | avx2   | 0.072742 | 0.002619   | 27.77x  | -96.40%        |
| 1280x720  | uint8   | 4        | 0.50    | hard_light    | scalar | 0.102061 | 0.036052   | 2.83x   | -64.68%        |
| 1280x720  | uint8   | 4        | 0.50    | hard_light    | sse42  | 0.102061 | 0.003300   | 30.93x  | -96.77%        |
| 1280x720  | uint8   | 4        | 0.50    | hard_light    | avx2   | 0.102061 | 0.002735   | 37.32x  | -97.32%        |
| 1280x720  | uint8   | 4        | 0.50    | difference    | scalar | 0.099504 | 0.021466   | 4.64x   | -78.43%        |
| 1280x720  | uint8   | 4        | 0.50    | difference    | sse42  | 0.099504 | 0.002698   | 36.88x  | -97.29%        |
| 1280x720  | uint8   | 4        | 0.50    | difference    | avx2   | 0.099504 | 0.002642   | 37.66x  | -97.34%        |
| 1280x720  | uint8   | 4        | 0.50    | subtract      | scalar | 0.070561 | 0.020939   | 3.37x   | -70.32%        |
| 1280x720  | uint8   | 4        | 0.50    | subtract      | sse42  | 0.070561 | 0.003773   | 18.70x  | -94.65%        |
| 1280x720  | uint8   | 4        | 0.50    | subtract      | avx2   | 0.070561 | 0.002766   | 25.51x  | -96.08%        |
| 1280x720  | uint8   | 4        | 0.50    | grain_extract | scalar | 0.077527 | 0.026171   | 2.96x   | -66.24%        |
| 1280x720  | uint8   | 4        | 0.50    | grain_extract | sse42  | 0.077527 | 0.002997   | 25.87x  | -96.13%        |
| 1280x720  | uint8   | 4        | 0.50    | grain_extract | avx2   | 0.077527 | 0.002672   | 29.02x  | -96.55%        |
| 1280x720  | uint8   | 4        | 0.50    | grain_merge   | scalar | 0.073560 | 0.026189   | 2.81x   | -64.40%        |
| 1280x720  | uint8   | 4        | 0.50    | grain_merge   | sse42  | 0.073560 | 0.002929   | 25.11x  | -96.02%        |
| 1280x720  | uint8   | 4        | 0.50    | grain_merge   | avx2   | 0.073560 | 0.002697   | 27.27x  | -96.33%        |
| 1280x720  | uint8   | 4        | 0.50    | divide        | scalar | 0.075317 | 0.022110   | 3.41x   | -70.64%        |
| 1280x720  | uint8   | 4        | 0.50    | divide        | sse42  | 0.075317 | 0.003000   | 25.10x  | -96.02%        |
| 1280x720  | uint8   | 4        | 0.50    | divide        | avx2   | 0.075317 | 0.002643   | 28.50x  | -96.49%        |
| 1280x720  | uint8   | 4        | 0.50    | overlay       | scalar | 0.096431 | 0.034995   | 2.76x   | -63.71%        |
| 1280x720  | uint8   | 4        | 0.50    | overlay       | sse42  | 0.096431 | 0.003136   | 30.75x  | -96.75%        |
| 1280x720  | uint8   | 4        | 0.50    | overlay       | avx2   | 0.096431 | 0.002704   | 35.66x  | -97.20%        |
| 1280x720  | float32 | 3        | 0.50    | normal        | scalar | 0.069425 | 0.006986   | 9.94x   | -89.94%        |
| 1280x720  | float32 | 3        | 0.50    | normal        | sse42  | 0.069425 | 0.003117   | 22.28x  | -95.51%        |
| 1280x720  | float32 | 3        | 0.50    | normal        | avx2   | 0.069425 | 0.002109   | 32.92x  | -96.96%        |
| 1280x720  | float32 | 3        | 0.50    | soft_light    | scalar | 0.099034 | 0.008611   | 11.50x  | -91.30%        |
| 1280x720  | float32 | 3        | 0.50    | soft_light    | sse42  | 0.099034 | 0.003609   | 27.44x  | -96.36%        |
| 1280x720  | float32 | 3        | 0.50    | soft_light    | avx2   | 0.099034 | 0.002588   | 38.27x  | -97.39%        |
| 1280x720  | float32 | 3        | 0.50    | lighten_only  | scalar | 0.076219 | 0.010349   | 7.37x   | -86.42%        |
| 1280x720  | float32 | 3        | 0.50    | lighten_only  | sse42  | 0.076219 | 0.002992   | 25.48x  | -96.07%        |
| 1280x720  | float32 | 3        | 0.50    | lighten_only  | avx2   | 0.076219 | 0.002483   | 30.69x  | -96.74%        |
| 1280x720  | float32 | 3        | 0.50    | screen        | scalar | 0.079757 | 0.007915   | 10.08x  | -90.08%        |
| 1280x720  | float32 | 3        | 0.50    | screen        | sse42  | 0.079757 | 0.003314   | 24.07x  | -95.85%        |
| 1280x720  | float32 | 3        | 0.50    | screen        | avx2   | 0.079757 | 0.002587   | 30.84x  | -96.76%        |
| 1280x720  | float32 | 3        | 0.50    | dodge         | scalar | 0.079434 | 0.008935   | 8.89x   | -88.75%        |
| 1280x720  | float32 | 3        | 0.50    | dodge         | sse42  | 0.079434 | 0.003829   | 20.74x  | -95.18%        |
| 1280x720  | float32 | 3        | 0.50    | dodge         | avx2   | 0.079434 | 0.002695   | 29.47x  | -96.61%        |
| 1280x720  | float32 | 3        | 0.50    | addition      | scalar | 0.075138 | 0.021973   | 3.42x   | -70.76%        |
| 1280x720  | float32 | 3        | 0.50    | addition      | sse42  | 0.075138 | 0.003200   | 23.48x  | -95.74%        |
| 1280x720  | float32 | 3        | 0.50    | addition      | avx2   | 0.075138 | 0.002603   | 28.87x  | -96.54%        |
| 1280x720  | float32 | 3        | 0.50    | darken_only   | scalar | 0.075713 | 0.010376   | 7.30x   | -86.30%        |
| 1280x720  | float32 | 3        | 0.50    | darken_only   | sse42  | 0.075713 | 0.002974   | 25.46x  | -96.07%        |
| 1280x720  | float32 | 3        | 0.50    | darken_only   | avx2   | 0.075713 | 0.002512   | 30.14x  | -96.68%        |
| 1280x720  | float32 | 3        | 0.50    | multiply      | scalar | 0.077318 | 0.007687   | 10.06x  | -90.06%        |
| 1280x720  | float32 | 3        | 0.50    | multiply      | sse42  | 0.077318 | 0.003087   | 25.05x  | -96.01%        |
| 1280x720  | float32 | 3        | 0.50    | multiply      | avx2   | 0.077318 | 0.002474   | 31.26x  | -96.80%        |
| 1280x720  | float32 | 3        | 0.50    | hard_light    | scalar | 0.106360 | 0.025090   | 4.24x   | -76.41%        |
| 1280x720  | float32 | 3        | 0.50    | hard_light    | sse42  | 0.106360 | 0.003859   | 27.56x  | -96.37%        |
| 1280x720  | float32 | 3        | 0.50    | hard_light    | avx2   | 0.106360 | 0.002638   | 40.31x  | -97.52%        |
| 1280x720  | float32 | 3        | 0.50    | difference    | scalar | 0.107376 | 0.007636   | 14.06x  | -92.89%        |
| 1280x720  | float32 | 3        | 0.50    | difference    | sse42  | 0.107376 | 0.003177   | 33.80x  | -97.04%        |
| 1280x720  | float32 | 3        | 0.50    | difference    | avx2   | 0.107376 | 0.002629   | 40.84x  | -97.55%        |
| 1280x720  | float32 | 3        | 0.50    | subtract      | scalar | 0.075215 | 0.009711   | 7.75x   | -87.09%        |
| 1280x720  | float32 | 3        | 0.50    | subtract      | sse42  | 0.075215 | 0.003354   | 22.43x  | -95.54%        |
| 1280x720  | float32 | 3        | 0.50    | subtract      | avx2   | 0.075215 | 0.002634   | 28.56x  | -96.50%        |
| 1280x720  | float32 | 3        | 0.50    | grain_extract | scalar | 0.078281 | 0.014250   | 5.49x   | -81.80%        |
| 1280x720  | float32 | 3        | 0.50    | grain_extract | sse42  | 0.078281 | 0.003380   | 23.16x  | -95.68%        |
| 1280x720  | float32 | 3        | 0.50    | grain_extract | avx2   | 0.078281 | 0.002587   | 30.26x  | -96.70%        |
| 1280x720  | float32 | 3        | 0.50    | grain_merge   | scalar | 0.079160 | 0.014203   | 5.57x   | -82.06%        |
| 1280x720  | float32 | 3        | 0.50    | grain_merge   | sse42  | 0.079160 | 0.003341   | 23.69x  | -95.78%        |
| 1280x720  | float32 | 3        | 0.50    | grain_merge   | avx2   | 0.079160 | 0.002648   | 29.89x  | -96.65%        |
| 1280x720  | float32 | 3        | 0.50    | divide        | scalar | 0.080059 | 0.008487   | 9.43x   | -89.40%        |
| 1280x720  | float32 | 3        | 0.50    | divide        | sse42  | 0.080059 | 0.003652   | 21.92x  | -95.44%        |
| 1280x720  | float32 | 3        | 0.50    | divide        | avx2   | 0.080059 | 0.002572   | 31.12x  | -96.79%        |
| 1280x720  | float32 | 3        | 0.50    | overlay       | scalar | 0.102808 | 0.023120   | 4.45x   | -77.51%        |
| 1280x720  | float32 | 3        | 0.50    | overlay       | sse42  | 0.102808 | 0.003546   | 28.99x  | -96.55%        |
| 1280x720  | float32 | 3        | 0.50    | overlay       | avx2   | 0.102808 | 0.002545   | 40.39x  | -97.52%        |
| 1280x720  | float32 | 4        | 0.50    | normal        | scalar | 0.053552 | 0.008554   | 6.26x   | -84.03%        |
| 1280x720  | float32 | 4        | 0.50    | normal        | sse42  | 0.053552 | 0.002480   | 21.60x  | -95.37%        |
| 1280x720  | float32 | 4        | 0.50    | normal        | avx2   | 0.053552 | 0.002320   | 23.08x  | -95.67%        |
| 1280x720  | float32 | 4        | 0.50    | soft_light    | scalar | 0.086830 | 0.010181   | 8.53x   | -88.27%        |
| 1280x720  | float32 | 4        | 0.50    | soft_light    | sse42  | 0.086830 | 0.002731   | 31.79x  | -96.85%        |
| 1280x720  | float32 | 4        | 0.50    | soft_light    | avx2   | 0.086830 | 0.002604   | 33.35x  | -97.00%        |
| 1280x720  | float32 | 4        | 0.50    | lighten_only  | scalar | 0.063381 | 0.010671   | 5.94x   | -83.16%        |
| 1280x720  | float32 | 4        | 0.50    | lighten_only  | sse42  | 0.063381 | 0.002535   | 25.01x  | -96.00%        |
| 1280x720  | float32 | 4        | 0.50    | lighten_only  | avx2   | 0.063381 | 0.002641   | 24.00x  | -95.83%        |
| 1280x720  | float32 | 4        | 0.50    | screen        | scalar | 0.065418 | 0.009376   | 6.98x   | -85.67%        |
| 1280x720  | float32 | 4        | 0.50    | screen        | sse42  | 0.065418 | 0.002774   | 23.58x  | -95.76%        |
| 1280x720  | float32 | 4        | 0.50    | screen        | avx2   | 0.065418 | 0.002590   | 25.26x  | -96.04%        |
| 1280x720  | float32 | 4        | 0.50    | dodge         | scalar | 0.066256 | 0.010685   | 6.20x   | -83.87%        |
| 1280x720  | float32 | 4        | 0.50    | dodge         | sse42  | 0.066256 | 0.003119   | 21.24x  | -95.29%        |
| 1280x720  | float32 | 4        | 0.50    | dodge         | avx2   | 0.066256 | 0.002652   | 24.98x  | -96.00%        |
| 1280x720  | float32 | 4        | 0.50    | addition      | scalar | 0.062488 | 0.018887   | 3.31x   | -69.78%        |
| 1280x720  | float32 | 4        | 0.50    | addition      | sse42  | 0.062488 | 0.002760   | 22.64x  | -95.58%        |
| 1280x720  | float32 | 4        | 0.50    | addition      | avx2   | 0.062488 | 0.002752   | 22.71x  | -95.60%        |
| 1280x720  | float32 | 4        | 0.50    | darken_only   | scalar | 0.063886 | 0.010755   | 5.94x   | -83.16%        |
| 1280x720  | float32 | 4        | 0.50    | darken_only   | sse42  | 0.063886 | 0.002540   | 25.15x  | -96.02%        |
| 1280x720  | float32 | 4        | 0.50    | darken_only   | avx2   | 0.063886 | 0.002654   | 24.07x  | -95.85%        |
| 1280x720  | float32 | 4        | 0.50    | multiply      | scalar | 0.064628 | 0.008938   | 7.23x   | -86.17%        |
| 1280x720  | float32 | 4        | 0.50    | multiply      | sse42  | 0.064628 | 0.002480   | 26.06x  | -96.16%        |
| 1280x720  | float32 | 4        | 0.50    | multiply      | avx2   | 0.064628 | 0.002697   | 23.96x  | -95.83%        |
| 1280x720  | float32 | 4        | 0.50    | hard_light    | scalar | 0.094296 | 0.025976   | 3.63x   | -72.45%        |
| 1280x720  | float32 | 4        | 0.50    | hard_light    | sse42  | 0.094296 | 0.003240   | 29.10x  | -96.56%        |
| 1280x720  | float32 | 4        | 0.50    | hard_light    | avx2   | 0.094296 | 0.002662   | 35.42x  | -97.18%        |
| 1280x720  | float32 | 4        | 0.50    | difference    | scalar | 0.089460 | 0.009272   | 9.65x   | -89.64%        |
| 1280x720  | float32 | 4        | 0.50    | difference    | sse42  | 0.089460 | 0.002693   | 33.22x  | -96.99%        |
| 1280x720  | float32 | 4        | 0.50    | difference    | avx2   | 0.089460 | 0.002657   | 33.67x  | -97.03%        |
| 1280x720  | float32 | 4        | 0.50    | subtract      | scalar | 0.062571 | 0.012158   | 5.15x   | -80.57%        |
| 1280x720  | float32 | 4        | 0.50    | subtract      | sse42  | 0.062571 | 0.002819   | 22.19x  | -95.49%        |
| 1280x720  | float32 | 4        | 0.50    | subtract      | avx2   | 0.062571 | 0.002690   | 23.26x  | -95.70%        |
| 1280x720  | float32 | 4        | 0.50    | grain_extract | scalar | 0.065961 | 0.014832   | 4.45x   | -77.51%        |
| 1280x720  | float32 | 4        | 0.50    | grain_extract | sse42  | 0.065961 | 0.002743   | 24.05x  | -95.84%        |
| 1280x720  | float32 | 4        | 0.50    | grain_extract | avx2   | 0.065961 | 0.002657   | 24.82x  | -95.97%        |
| 1280x720  | float32 | 4        | 0.50    | grain_merge   | scalar | 0.065400 | 0.014782   | 4.42x   | -77.40%        |
| 1280x720  | float32 | 4        | 0.50    | grain_merge   | sse42  | 0.065400 | 0.002769   | 23.62x  | -95.77%        |
| 1280x720  | float32 | 4        | 0.50    | grain_merge   | avx2   | 0.065400 | 0.002657   | 24.61x  | -95.94%        |
| 1280x720  | float32 | 4        | 0.50    | divide        | scalar | 0.067018 | 0.010000   | 6.70x   | -85.08%        |
| 1280x720  | float32 | 4        | 0.50    | divide        | sse42  | 0.067018 | 0.002592   | 25.85x  | -96.13%        |
| 1280x720  | float32 | 4        | 0.50    | divide        | avx2   | 0.067018 | 0.002624   | 25.54x  | -96.09%        |
| 1280x720  | float32 | 4        | 0.50    | overlay       | scalar | 0.088559 | 0.024248   | 3.65x   | -72.62%        |
| 1280x720  | float32 | 4        | 0.50    | overlay       | sse42  | 0.088559 | 0.002877   | 30.78x  | -96.75%        |
| 1280x720  | float32 | 4        | 0.50    | overlay       | avx2   | 0.088559 | 0.002663   | 33.26x  | -96.99%        |
| 1920x1080 | uint8   | 3        | 0.50    | normal        | scalar | 0.178329 | 0.051089   | 3.49x   | -71.35%        |
| 1920x1080 | uint8   | 3        | 0.50    | normal        | sse42  | 0.178329 | 0.021518   | 8.29x   | -87.93%        |
| 1920x1080 | uint8   | 3        | 0.50    | normal        | avx2   | 0.178329 | 0.021954   | 8.12x   | -87.69%        |
| 1920x1080 | uint8   | 3        | 0.50    | soft_light    | scalar | 0.231888 | 0.055632   | 4.17x   | -76.01%        |
| 1920x1080 | uint8   | 3        | 0.50    | soft_light    | sse42  | 0.231888 | 0.026752   | 8.67x   | -88.46%        |
| 1920x1080 | uint8   | 3        | 0.50    | soft_light    | avx2   | 0.231888 | 0.025119   | 9.23x   | -89.17%        |
| 1920x1080 | uint8   | 3        | 0.50    | lighten_only  | scalar | 0.183124 | 0.059979   | 3.05x   | -67.25%        |
| 1920x1080 | uint8   | 3        | 0.50    | lighten_only  | sse42  | 0.183124 | 0.024337   | 7.52x   | -86.71%        |
| 1920x1080 | uint8   | 3        | 0.50    | lighten_only  | avx2   | 0.183124 | 0.023727   | 7.72x   | -87.04%        |
| 1920x1080 | uint8   | 3        | 0.50    | screen        | scalar | 0.188155 | 0.054202   | 3.47x   | -71.19%        |
| 1920x1080 | uint8   | 3        | 0.50    | screen        | sse42  | 0.188155 | 0.025134   | 7.49x   | -86.64%        |
| 1920x1080 | uint8   | 3        | 0.50    | screen        | avx2   | 0.188155 | 0.024074   | 7.82x   | -87.21%        |
| 1920x1080 | uint8   | 3        | 0.50    | dodge         | scalar | 0.187504 | 0.055930   | 3.35x   | -70.17%        |
| 1920x1080 | uint8   | 3        | 0.50    | dodge         | sse42  | 0.187504 | 0.027428   | 6.84x   | -85.37%        |
| 1920x1080 | uint8   | 3        | 0.50    | dodge         | avx2   | 0.187504 | 0.025177   | 7.45x   | -86.57%        |
| 1920x1080 | uint8   | 3        | 0.50    | addition      | scalar | 0.183592 | 0.078733   | 2.33x   | -57.11%        |
| 1920x1080 | uint8   | 3        | 0.50    | addition      | sse42  | 0.183592 | 0.024795   | 7.40x   | -86.49%        |
| 1920x1080 | uint8   | 3        | 0.50    | addition      | avx2   | 0.183592 | 0.023970   | 7.66x   | -86.94%        |
| 1920x1080 | uint8   | 3        | 0.50    | darken_only   | scalar | 0.182550 | 0.060096   | 3.04x   | -67.08%        |
| 1920x1080 | uint8   | 3        | 0.50    | darken_only   | sse42  | 0.182550 | 0.024335   | 7.50x   | -86.67%        |
| 1920x1080 | uint8   | 3        | 0.50    | darken_only   | avx2   | 0.182550 | 0.023663   | 7.71x   | -87.04%        |
| 1920x1080 | uint8   | 3        | 0.50    | multiply      | scalar | 0.184255 | 0.054078   | 3.41x   | -70.65%        |
| 1920x1080 | uint8   | 3        | 0.50    | multiply      | sse42  | 0.184255 | 0.024528   | 7.51x   | -86.69%        |
| 1920x1080 | uint8   | 3        | 0.50    | multiply      | avx2   | 0.184255 | 0.023789   | 7.75x   | -87.09%        |
| 1920x1080 | uint8   | 3        | 0.50    | hard_light    | scalar | 0.248438 | 0.092767   | 2.68x   | -62.66%        |
| 1920x1080 | uint8   | 3        | 0.50    | hard_light    | sse42  | 0.248438 | 0.027670   | 8.98x   | -88.86%        |
| 1920x1080 | uint8   | 3        | 0.50    | hard_light    | avx2   | 0.248438 | 0.025266   | 9.83x   | -89.83%        |
| 1920x1080 | uint8   | 3        | 0.50    | difference    | scalar | 0.239445 | 0.054576   | 4.39x   | -77.21%        |
| 1920x1080 | uint8   | 3        | 0.50    | difference    | sse42  | 0.239445 | 0.024341   | 9.84x   | -89.83%        |
| 1920x1080 | uint8   | 3        | 0.50    | difference    | avx2   | 0.239445 | 0.023582   | 10.15x  | -90.15%        |
| 1920x1080 | uint8   | 3        | 0.50    | subtract      | scalar | 0.183817 | 0.049855   | 3.69x   | -72.88%        |
| 1920x1080 | uint8   | 3        | 0.50    | subtract      | sse42  | 0.183817 | 0.026967   | 6.82x   | -85.33%        |
| 1920x1080 | uint8   | 3        | 0.50    | subtract      | avx2   | 0.183817 | 0.024730   | 7.43x   | -86.55%        |
| 1920x1080 | uint8   | 3        | 0.50    | grain_extract | scalar | 0.186623 | 0.065803   | 2.84x   | -64.74%        |
| 1920x1080 | uint8   | 3        | 0.50    | grain_extract | sse42  | 0.186623 | 0.026441   | 7.06x   | -85.83%        |
| 1920x1080 | uint8   | 3        | 0.50    | grain_extract | avx2   | 0.186623 | 0.024719   | 7.55x   | -86.75%        |
| 1920x1080 | uint8   | 3        | 0.50    | grain_merge   | scalar | 0.186932 | 0.065918   | 2.84x   | -64.74%        |
| 1920x1080 | uint8   | 3        | 0.50    | grain_merge   | sse42  | 0.186932 | 0.026523   | 7.05x   | -85.81%        |
| 1920x1080 | uint8   | 3        | 0.50    | grain_merge   | avx2   | 0.186932 | 0.024638   | 7.59x   | -86.82%        |
| 1920x1080 | uint8   | 3        | 0.50    | divide        | scalar | 0.189024 | 0.054808   | 3.45x   | -71.00%        |
| 1920x1080 | uint8   | 3        | 0.50    | divide        | sse42  | 0.189024 | 0.026883   | 7.03x   | -85.78%        |
| 1920x1080 | uint8   | 3        | 0.50    | divide        | avx2   | 0.189024 | 0.024999   | 7.56x   | -86.77%        |
| 1920x1080 | uint8   | 3        | 0.50    | overlay       | scalar | 0.236708 | 0.090381   | 2.62x   | -61.82%        |
| 1920x1080 | uint8   | 3        | 0.50    | overlay       | sse42  | 0.236708 | 0.027276   | 8.68x   | -88.48%        |
| 1920x1080 | uint8   | 3        | 0.50    | overlay       | avx2   | 0.236708 | 0.024988   | 9.47x   | -89.44%        |
| 1920x1080 | uint8   | 4        | 0.50    | normal        | scalar | 0.128106 | 0.040784   | 3.14x   | -68.16%        |
| 1920x1080 | uint8   | 4        | 0.50    | normal        | sse42  | 0.128106 | 0.005405   | 23.70x  | -95.78%        |
| 1920x1080 | uint8   | 4        | 0.50    | normal        | avx2   | 0.128106 | 0.004929   | 25.99x  | -96.15%        |
| 1920x1080 | uint8   | 4        | 0.50    | soft_light    | scalar | 0.186097 | 0.051095   | 3.64x   | -72.54%        |
| 1920x1080 | uint8   | 4        | 0.50    | soft_light    | sse42  | 0.186097 | 0.006841   | 27.20x  | -96.32%        |
| 1920x1080 | uint8   | 4        | 0.50    | soft_light    | avx2   | 0.186097 | 0.006095   | 30.53x  | -96.72%        |
| 1920x1080 | uint8   | 4        | 0.50    | lighten_only  | scalar | 0.135359 | 0.054362   | 2.49x   | -59.84%        |
| 1920x1080 | uint8   | 4        | 0.50    | lighten_only  | sse42  | 0.135359 | 0.005917   | 22.88x  | -95.63%        |
| 1920x1080 | uint8   | 4        | 0.50    | lighten_only  | avx2   | 0.135359 | 0.005913   | 22.89x  | -95.63%        |
| 1920x1080 | uint8   | 4        | 0.50    | screen        | scalar | 0.143102 | 0.048798   | 2.93x   | -65.90%        |
| 1920x1080 | uint8   | 4        | 0.50    | screen        | sse42  | 0.143102 | 0.006545   | 21.87x  | -95.43%        |
| 1920x1080 | uint8   | 4        | 0.50    | screen        | avx2   | 0.143102 | 0.006082   | 23.53x  | -95.75%        |
| 1920x1080 | uint8   | 4        | 0.50    | dodge         | scalar | 0.141218 | 0.050756   | 2.78x   | -64.06%        |
| 1920x1080 | uint8   | 4        | 0.50    | dodge         | sse42  | 0.141218 | 0.007383   | 19.13x  | -94.77%        |
| 1920x1080 | uint8   | 4        | 0.50    | dodge         | avx2   | 0.141218 | 0.006040   | 23.38x  | -95.72%        |
| 1920x1080 | uint8   | 4        | 0.50    | addition      | scalar | 0.137256 | 0.062144   | 2.21x   | -54.72%        |
| 1920x1080 | uint8   | 4        | 0.50    | addition      | sse42  | 0.137256 | 0.007937   | 17.29x  | -94.22%        |
| 1920x1080 | uint8   | 4        | 0.50    | addition      | avx2   | 0.137256 | 0.006272   | 21.89x  | -95.43%        |
| 1920x1080 | uint8   | 4        | 0.50    | darken_only   | scalar | 0.137162 | 0.054749   | 2.51x   | -60.08%        |
| 1920x1080 | uint8   | 4        | 0.50    | darken_only   | sse42  | 0.137162 | 0.005932   | 23.12x  | -95.68%        |
| 1920x1080 | uint8   | 4        | 0.50    | darken_only   | avx2   | 0.137162 | 0.005977   | 22.95x  | -95.64%        |
| 1920x1080 | uint8   | 4        | 0.50    | multiply      | scalar | 0.137705 | 0.049048   | 2.81x   | -64.38%        |
| 1920x1080 | uint8   | 4        | 0.50    | multiply      | sse42  | 0.137705 | 0.006189   | 22.25x  | -95.51%        |
| 1920x1080 | uint8   | 4        | 0.50    | multiply      | avx2   | 0.137705 | 0.005921   | 23.26x  | -95.70%        |
| 1920x1080 | uint8   | 4        | 0.50    | hard_light    | scalar | 0.201249 | 0.081502   | 2.47x   | -59.50%        |
| 1920x1080 | uint8   | 4        | 0.50    | hard_light    | sse42  | 0.201249 | 0.007465   | 26.96x  | -96.29%        |
| 1920x1080 | uint8   | 4        | 0.50    | hard_light    | avx2   | 0.201249 | 0.006134   | 32.81x  | -96.95%        |
| 1920x1080 | uint8   | 4        | 0.50    | difference    | scalar | 0.193495 | 0.048440   | 3.99x   | -74.97%        |
| 1920x1080 | uint8   | 4        | 0.50    | difference    | sse42  | 0.193495 | 0.006059   | 31.93x  | -96.87%        |
| 1920x1080 | uint8   | 4        | 0.50    | difference    | avx2   | 0.193495 | 0.005944   | 32.55x  | -96.93%        |
| 1920x1080 | uint8   | 4        | 0.50    | subtract      | scalar | 0.137287 | 0.046880   | 2.93x   | -65.85%        |
| 1920x1080 | uint8   | 4        | 0.50    | subtract      | sse42  | 0.137287 | 0.008283   | 16.58x  | -93.97%        |
| 1920x1080 | uint8   | 4        | 0.50    | subtract      | avx2   | 0.137287 | 0.006258   | 21.94x  | -95.44%        |
| 1920x1080 | uint8   | 4        | 0.50    | grain_extract | scalar | 0.140270 | 0.059258   | 2.37x   | -57.75%        |
| 1920x1080 | uint8   | 4        | 0.50    | grain_extract | sse42  | 0.140270 | 0.006624   | 21.18x  | -95.28%        |
| 1920x1080 | uint8   | 4        | 0.50    | grain_extract | avx2   | 0.140270 | 0.006020   | 23.30x  | -95.71%        |
| 1920x1080 | uint8   | 4        | 0.50    | grain_merge   | scalar | 0.140673 | 0.059376   | 2.37x   | -57.79%        |
| 1920x1080 | uint8   | 4        | 0.50    | grain_merge   | sse42  | 0.140673 | 0.006577   | 21.39x  | -95.32%        |
| 1920x1080 | uint8   | 4        | 0.50    | grain_merge   | avx2   | 0.140673 | 0.006018   | 23.37x  | -95.72%        |
| 1920x1080 | uint8   | 4        | 0.50    | divide        | scalar | 0.142035 | 0.049674   | 2.86x   | -65.03%        |
| 1920x1080 | uint8   | 4        | 0.50    | divide        | sse42  | 0.142035 | 0.006778   | 20.96x  | -95.23%        |
| 1920x1080 | uint8   | 4        | 0.50    | divide        | avx2   | 0.142035 | 0.005916   | 24.01x  | -95.83%        |
| 1920x1080 | uint8   | 4        | 0.50    | overlay       | scalar | 0.190905 | 0.078591   | 2.43x   | -58.83%        |
| 1920x1080 | uint8   | 4        | 0.50    | overlay       | sse42  | 0.190905 | 0.007046   | 27.09x  | -96.31%        |
| 1920x1080 | uint8   | 4        | 0.50    | overlay       | avx2   | 0.190905 | 0.006110   | 31.24x  | -96.80%        |
| 1920x1080 | float32 | 3        | 0.50    | normal        | scalar | 0.152390 | 0.016562   | 9.20x   | -89.13%        |
| 1920x1080 | float32 | 3        | 0.50    | normal        | sse42  | 0.152390 | 0.007055   | 21.60x  | -95.37%        |
| 1920x1080 | float32 | 3        | 0.50    | normal        | avx2   | 0.152390 | 0.004865   | 31.32x  | -96.81%        |
| 1920x1080 | float32 | 3        | 0.50    | soft_light    | scalar | 0.211907 | 0.020213   | 10.48x  | -90.46%        |
| 1920x1080 | float32 | 3        | 0.50    | soft_light    | sse42  | 0.211907 | 0.008103   | 26.15x  | -96.18%        |
| 1920x1080 | float32 | 3        | 0.50    | soft_light    | avx2   | 0.211907 | 0.005889   | 35.98x  | -97.22%        |
| 1920x1080 | float32 | 3        | 0.50    | lighten_only  | scalar | 0.161018 | 0.023887   | 6.74x   | -85.17%        |
| 1920x1080 | float32 | 3        | 0.50    | lighten_only  | sse42  | 0.161018 | 0.006812   | 23.64x  | -95.77%        |
| 1920x1080 | float32 | 3        | 0.50    | lighten_only  | avx2   | 0.161018 | 0.005651   | 28.50x  | -96.49%        |
| 1920x1080 | float32 | 3        | 0.50    | screen        | scalar | 0.167185 | 0.018361   | 9.11x   | -89.02%        |
| 1920x1080 | float32 | 3        | 0.50    | screen        | sse42  | 0.167185 | 0.007532   | 22.20x  | -95.49%        |
| 1920x1080 | float32 | 3        | 0.50    | screen        | avx2   | 0.167185 | 0.005948   | 28.11x  | -96.44%        |
| 1920x1080 | float32 | 3        | 0.50    | dodge         | scalar | 0.170216 | 0.020697   | 8.22x   | -87.84%        |
| 1920x1080 | float32 | 3        | 0.50    | dodge         | sse42  | 0.170216 | 0.008523   | 19.97x  | -94.99%        |
| 1920x1080 | float32 | 3        | 0.50    | dodge         | avx2   | 0.170216 | 0.005903   | 28.83x  | -96.53%        |
| 1920x1080 | float32 | 3        | 0.50    | addition      | scalar | 0.159729 | 0.050342   | 3.17x   | -68.48%        |
| 1920x1080 | float32 | 3        | 0.50    | addition      | sse42  | 0.159729 | 0.007194   | 22.20x  | -95.50%        |
| 1920x1080 | float32 | 3        | 0.50    | addition      | avx2   | 0.159729 | 0.006030   | 26.49x  | -96.22%        |
| 1920x1080 | float32 | 3        | 0.50    | darken_only   | scalar | 0.161209 | 0.023882   | 6.75x   | -85.19%        |
| 1920x1080 | float32 | 3        | 0.50    | darken_only   | sse42  | 0.161209 | 0.006803   | 23.70x  | -95.78%        |
| 1920x1080 | float32 | 3        | 0.50    | darken_only   | avx2   | 0.161209 | 0.005755   | 28.01x  | -96.43%        |
| 1920x1080 | float32 | 3        | 0.50    | multiply      | scalar | 0.162961 | 0.017920   | 9.09x   | -89.00%        |
| 1920x1080 | float32 | 3        | 0.50    | multiply      | sse42  | 0.162961 | 0.006941   | 23.48x  | -95.74%        |
| 1920x1080 | float32 | 3        | 0.50    | multiply      | avx2   | 0.162961 | 0.005721   | 28.48x  | -96.49%        |
| 1920x1080 | float32 | 3        | 0.50    | hard_light    | scalar | 0.227699 | 0.057550   | 3.96x   | -74.73%        |
| 1920x1080 | float32 | 3        | 0.50    | hard_light    | sse42  | 0.227699 | 0.008580   | 26.54x  | -96.23%        |
| 1920x1080 | float32 | 3        | 0.50    | hard_light    | avx2   | 0.227699 | 0.005955   | 38.24x  | -97.38%        |
| 1920x1080 | float32 | 3        | 0.50    | difference    | scalar | 0.219765 | 0.018061   | 12.17x  | -91.78%        |
| 1920x1080 | float32 | 3        | 0.50    | difference    | sse42  | 0.219765 | 0.007136   | 30.80x  | -96.75%        |
| 1920x1080 | float32 | 3        | 0.50    | difference    | avx2   | 0.219765 | 0.005799   | 37.90x  | -97.36%        |
| 1920x1080 | float32 | 3        | 0.50    | subtract      | scalar | 0.160355 | 0.024139   | 6.64x   | -84.95%        |
| 1920x1080 | float32 | 3        | 0.50    | subtract      | sse42  | 0.160355 | 0.007465   | 21.48x  | -95.34%        |
| 1920x1080 | float32 | 3        | 0.50    | subtract      | avx2   | 0.160355 | 0.005891   | 27.22x  | -96.33%        |
| 1920x1080 | float32 | 3        | 0.50    | grain_extract | scalar | 0.164935 | 0.033032   | 4.99x   | -79.97%        |
| 1920x1080 | float32 | 3        | 0.50    | grain_extract | sse42  | 0.164935 | 0.007503   | 21.98x  | -95.45%        |
| 1920x1080 | float32 | 3        | 0.50    | grain_extract | avx2   | 0.164935 | 0.005854   | 28.18x  | -96.45%        |
| 1920x1080 | float32 | 3        | 0.50    | grain_merge   | scalar | 0.164264 | 0.032829   | 5.00x   | -80.01%        |
| 1920x1080 | float32 | 3        | 0.50    | grain_merge   | sse42  | 0.164264 | 0.007499   | 21.91x  | -95.43%        |
| 1920x1080 | float32 | 3        | 0.50    | grain_merge   | avx2   | 0.164264 | 0.005818   | 28.23x  | -96.46%        |
| 1920x1080 | float32 | 3        | 0.50    | divide        | scalar | 0.166134 | 0.020039   | 8.29x   | -87.94%        |
| 1920x1080 | float32 | 3        | 0.50    | divide        | sse42  | 0.166134 | 0.008222   | 20.20x  | -95.05%        |
| 1920x1080 | float32 | 3        | 0.50    | divide        | avx2   | 0.166134 | 0.005873   | 28.29x  | -96.47%        |
| 1920x1080 | float32 | 3        | 0.50    | overlay       | scalar | 0.219111 | 0.052779   | 4.15x   | -75.91%        |
| 1920x1080 | float32 | 3        | 0.50    | overlay       | sse42  | 0.219111 | 0.008042   | 27.24x  | -96.33%        |
| 1920x1080 | float32 | 3        | 0.50    | overlay       | avx2   | 0.219111 | 0.005851   | 37.45x  | -97.33%        |
| 1920x1080 | float32 | 4        | 0.50    | normal        | scalar | 0.118436 | 0.019197   | 6.17x   | -83.79%        |
| 1920x1080 | float32 | 4        | 0.50    | normal        | sse42  | 0.118436 | 0.005119   | 23.14x  | -95.68%        |
| 1920x1080 | float32 | 4        | 0.50    | normal        | avx2   | 0.118436 | 0.007531   | 15.73x  | -93.64%        |
| 1920x1080 | float32 | 4        | 0.50    | soft_light    | scalar | 0.176563 | 0.022513   | 7.84x   | -87.25%        |
| 1920x1080 | float32 | 4        | 0.50    | soft_light    | sse42  | 0.176563 | 0.005950   | 29.67x  | -96.63%        |
| 1920x1080 | float32 | 4        | 0.50    | soft_light    | avx2   | 0.176563 | 0.005934   | 29.76x  | -96.64%        |
| 1920x1080 | float32 | 4        | 0.50    | lighten_only  | scalar | 0.125354 | 0.024120   | 5.20x   | -80.76%        |
| 1920x1080 | float32 | 4        | 0.50    | lighten_only  | sse42  | 0.125354 | 0.005427   | 23.10x  | -95.67%        |
| 1920x1080 | float32 | 4        | 0.50    | lighten_only  | avx2   | 0.125354 | 0.005834   | 21.49x  | -95.35%        |
| 1920x1080 | float32 | 4        | 0.50    | screen        | scalar | 0.132034 | 0.021015   | 6.28x   | -84.08%        |
| 1920x1080 | float32 | 4        | 0.50    | screen        | sse42  | 0.132034 | 0.005674   | 23.27x  | -95.70%        |
| 1920x1080 | float32 | 4        | 0.50    | screen        | avx2   | 0.132034 | 0.006032   | 21.89x  | -95.43%        |
| 1920x1080 | float32 | 4        | 0.50    | dodge         | scalar | 0.132411 | 0.023661   | 5.60x   | -82.13%        |
| 1920x1080 | float32 | 4        | 0.50    | dodge         | sse42  | 0.132411 | 0.006839   | 19.36x  | -94.83%        |
| 1920x1080 | float32 | 4        | 0.50    | dodge         | avx2   | 0.132411 | 0.006003   | 22.06x  | -95.47%        |
| 1920x1080 | float32 | 4        | 0.50    | addition      | scalar | 0.128232 | 0.042175   | 3.04x   | -67.11%        |
| 1920x1080 | float32 | 4        | 0.50    | addition      | sse42  | 0.128232 | 0.006078   | 21.10x  | -95.26%        |
| 1920x1080 | float32 | 4        | 0.50    | addition      | avx2   | 0.128232 | 0.006165   | 20.80x  | -95.19%        |
| 1920x1080 | float32 | 4        | 0.50    | darken_only   | scalar | 0.126858 | 0.024016   | 5.28x   | -81.07%        |
| 1920x1080 | float32 | 4        | 0.50    | darken_only   | sse42  | 0.126858 | 0.005461   | 23.23x  | -95.70%        |
| 1920x1080 | float32 | 4        | 0.50    | darken_only   | avx2   | 0.126858 | 0.005987   | 21.19x  | -95.28%        |
| 1920x1080 | float32 | 4        | 0.50    | multiply      | scalar | 0.127328 | 0.020449   | 6.23x   | -83.94%        |
| 1920x1080 | float32 | 4        | 0.50    | multiply      | sse42  | 0.127328 | 0.005670   | 22.46x  | -95.55%        |
| 1920x1080 | float32 | 4        | 0.50    | multiply      | avx2   | 0.127328 | 0.005976   | 21.31x  | -95.31%        |
| 1920x1080 | float32 | 4        | 0.50    | hard_light    | scalar | 0.191745 | 0.058264   | 3.29x   | -69.61%        |
| 1920x1080 | float32 | 4        | 0.50    | hard_light    | sse42  | 0.191745 | 0.006904   | 27.77x  | -96.40%        |
| 1920x1080 | float32 | 4        | 0.50    | hard_light    | avx2   | 0.191745 | 0.005993   | 31.99x  | -96.87%        |
| 1920x1080 | float32 | 4        | 0.50    | difference    | scalar | 0.185137 | 0.020425   | 9.06x   | -88.97%        |
| 1920x1080 | float32 | 4        | 0.50    | difference    | sse42  | 0.185137 | 0.005655   | 32.74x  | -96.95%        |
| 1920x1080 | float32 | 4        | 0.50    | difference    | avx2   | 0.185137 | 0.006018   | 30.76x  | -96.75%        |
| 1920x1080 | float32 | 4        | 0.50    | subtract      | scalar | 0.127840 | 0.027229   | 4.69x   | -78.70%        |
| 1920x1080 | float32 | 4        | 0.50    | subtract      | sse42  | 0.127840 | 0.006281   | 20.36x  | -95.09%        |
| 1920x1080 | float32 | 4        | 0.50    | subtract      | avx2   | 0.127840 | 0.006074   | 21.05x  | -95.25%        |
| 1920x1080 | float32 | 4        | 0.50    | grain_extract | scalar | 0.133941 | 0.033724   | 3.97x   | -74.82%        |
| 1920x1080 | float32 | 4        | 0.50    | grain_extract | sse42  | 0.133941 | 0.005660   | 23.67x  | -95.77%        |
| 1920x1080 | float32 | 4        | 0.50    | grain_extract | avx2   | 0.133941 | 0.005889   | 22.74x  | -95.60%        |
| 1920x1080 | float32 | 4        | 0.50    | grain_merge   | scalar | 0.130575 | 0.033254   | 3.93x   | -74.53%        |
| 1920x1080 | float32 | 4        | 0.50    | grain_merge   | sse42  | 0.130575 | 0.005567   | 23.46x  | -95.74%        |
| 1920x1080 | float32 | 4        | 0.50    | grain_merge   | avx2   | 0.130575 | 0.005942   | 21.97x  | -95.45%        |
| 1920x1080 | float32 | 4        | 0.50    | divide        | scalar | 0.133021 | 0.022286   | 5.97x   | -83.25%        |
| 1920x1080 | float32 | 4        | 0.50    | divide        | sse42  | 0.133021 | 0.005650   | 23.54x  | -95.75%        |
| 1920x1080 | float32 | 4        | 0.50    | divide        | avx2   | 0.133021 | 0.005934   | 22.42x  | -95.54%        |
| 1920x1080 | float32 | 4        | 0.50    | overlay       | scalar | 0.182389 | 0.054219   | 3.36x   | -70.27%        |
| 1920x1080 | float32 | 4        | 0.50    | overlay       | sse42  | 0.182389 | 0.006018   | 30.31x  | -96.70%        |
| 1920x1080 | float32 | 4        | 0.50    | overlay       | avx2   | 0.182389 | 0.005850   | 31.18x  | -96.79%        |
| 2560x1440 | uint8   | 3        | 0.50    | normal        | scalar | 0.313608 | 0.089129   | 3.52x   | -71.58%        |
| 2560x1440 | uint8   | 3        | 0.50    | normal        | sse42  | 0.313608 | 0.038308   | 8.19x   | -87.78%        |
| 2560x1440 | uint8   | 3        | 0.50    | normal        | avx2   | 0.313608 | 0.038868   | 8.07x   | -87.61%        |
| 2560x1440 | uint8   | 3        | 0.50    | soft_light    | scalar | 0.416662 | 0.098922   | 4.21x   | -76.26%        |
| 2560x1440 | uint8   | 3        | 0.50    | soft_light    | sse42  | 0.416662 | 0.047836   | 8.71x   | -88.52%        |
| 2560x1440 | uint8   | 3        | 0.50    | soft_light    | avx2   | 0.416662 | 0.044841   | 9.29x   | -89.24%        |
| 2560x1440 | uint8   | 3        | 0.50    | lighten_only  | scalar | 0.316356 | 0.105798   | 2.99x   | -66.56%        |
| 2560x1440 | uint8   | 3        | 0.50    | lighten_only  | sse42  | 0.316356 | 0.043586   | 7.26x   | -86.22%        |
| 2560x1440 | uint8   | 3        | 0.50    | lighten_only  | avx2   | 0.316356 | 0.042074   | 7.52x   | -86.70%        |
| 2560x1440 | uint8   | 3        | 0.50    | screen        | scalar | 0.330699 | 0.096528   | 3.43x   | -70.81%        |
| 2560x1440 | uint8   | 3        | 0.50    | screen        | sse42  | 0.330699 | 0.044501   | 7.43x   | -86.54%        |
| 2560x1440 | uint8   | 3        | 0.50    | screen        | avx2   | 0.330699 | 0.042667   | 7.75x   | -87.10%        |
| 2560x1440 | uint8   | 3        | 0.50    | dodge         | scalar | 0.332803 | 0.099483   | 3.35x   | -70.11%        |
| 2560x1440 | uint8   | 3        | 0.50    | dodge         | sse42  | 0.332803 | 0.048854   | 6.81x   | -85.32%        |
| 2560x1440 | uint8   | 3        | 0.50    | dodge         | avx2   | 0.332803 | 0.044766   | 7.43x   | -86.55%        |
| 2560x1440 | uint8   | 3        | 0.50    | addition      | scalar | 0.323630 | 0.139858   | 2.31x   | -56.78%        |
| 2560x1440 | uint8   | 3        | 0.50    | addition      | sse42  | 0.323630 | 0.044392   | 7.29x   | -86.28%        |
| 2560x1440 | uint8   | 3        | 0.50    | addition      | avx2   | 0.323630 | 0.042118   | 7.68x   | -86.99%        |
| 2560x1440 | uint8   | 3        | 0.50    | darken_only   | scalar | 0.314227 | 0.106816   | 2.94x   | -66.01%        |
| 2560x1440 | uint8   | 3        | 0.50    | darken_only   | sse42  | 0.314227 | 0.043456   | 7.23x   | -86.17%        |
| 2560x1440 | uint8   | 3        | 0.50    | darken_only   | avx2   | 0.314227 | 0.041678   | 7.54x   | -86.74%        |
| 2560x1440 | uint8   | 3        | 0.50    | multiply      | scalar | 0.321665 | 0.095942   | 3.35x   | -70.17%        |
| 2560x1440 | uint8   | 3        | 0.50    | multiply      | sse42  | 0.321665 | 0.043439   | 7.40x   | -86.50%        |
| 2560x1440 | uint8   | 3        | 0.50    | multiply      | avx2   | 0.321665 | 0.042692   | 7.53x   | -86.73%        |
| 2560x1440 | uint8   | 3        | 0.50    | hard_light    | scalar | 0.456902 | 0.164503   | 2.78x   | -64.00%        |
| 2560x1440 | uint8   | 3        | 0.50    | hard_light    | sse42  | 0.456902 | 0.049085   | 9.31x   | -89.26%        |
| 2560x1440 | uint8   | 3        | 0.50    | hard_light    | avx2   | 0.456902 | 0.044839   | 10.19x  | -90.19%        |
| 2560x1440 | uint8   | 3        | 0.50    | difference    | scalar | 0.415910 | 0.096615   | 4.30x   | -76.77%        |
| 2560x1440 | uint8   | 3        | 0.50    | difference    | sse42  | 0.415910 | 0.043322   | 9.60x   | -89.58%        |
| 2560x1440 | uint8   | 3        | 0.50    | difference    | avx2   | 0.415910 | 0.041992   | 9.90x   | -89.90%        |
| 2560x1440 | uint8   | 3        | 0.50    | subtract      | scalar | 0.320067 | 0.088393   | 3.62x   | -72.38%        |
| 2560x1440 | uint8   | 3        | 0.50    | subtract      | sse42  | 0.320067 | 0.047635   | 6.72x   | -85.12%        |
| 2560x1440 | uint8   | 3        | 0.50    | subtract      | avx2   | 0.320067 | 0.043891   | 7.29x   | -86.29%        |
| 2560x1440 | uint8   | 3        | 0.50    | grain_extract | scalar | 0.330550 | 0.116991   | 2.83x   | -64.61%        |
| 2560x1440 | uint8   | 3        | 0.50    | grain_extract | sse42  | 0.330550 | 0.047054   | 7.02x   | -85.76%        |
| 2560x1440 | uint8   | 3        | 0.50    | grain_extract | avx2   | 0.330550 | 0.043705   | 7.56x   | -86.78%        |
| 2560x1440 | uint8   | 3        | 0.50    | grain_merge   | scalar | 0.328237 | 0.116566   | 2.82x   | -64.49%        |
| 2560x1440 | uint8   | 3        | 0.50    | grain_merge   | sse42  | 0.328237 | 0.047157   | 6.96x   | -85.63%        |
| 2560x1440 | uint8   | 3        | 0.50    | grain_merge   | avx2   | 0.328237 | 0.044248   | 7.42x   | -86.52%        |
| 2560x1440 | uint8   | 3        | 0.50    | divide        | scalar | 0.333987 | 0.098319   | 3.40x   | -70.56%        |
| 2560x1440 | uint8   | 3        | 0.50    | divide        | sse42  | 0.333987 | 0.047626   | 7.01x   | -85.74%        |
| 2560x1440 | uint8   | 3        | 0.50    | divide        | avx2   | 0.333987 | 0.045206   | 7.39x   | -86.46%        |
| 2560x1440 | uint8   | 3        | 0.50    | overlay       | scalar | 0.427780 | 0.160792   | 2.66x   | -62.41%        |
| 2560x1440 | uint8   | 3        | 0.50    | overlay       | sse42  | 0.427780 | 0.048677   | 8.79x   | -88.62%        |
| 2560x1440 | uint8   | 3        | 0.50    | overlay       | avx2   | 0.427780 | 0.044500   | 9.61x   | -89.60%        |
| 2560x1440 | uint8   | 4        | 0.50    | normal        | scalar | 0.235295 | 0.074157   | 3.17x   | -68.48%        |
| 2560x1440 | uint8   | 4        | 0.50    | normal        | sse42  | 0.235295 | 0.009799   | 24.01x  | -95.84%        |
| 2560x1440 | uint8   | 4        | 0.50    | normal        | avx2   | 0.235295 | 0.008777   | 26.81x  | -96.27%        |
| 2560x1440 | uint8   | 4        | 0.50    | soft_light    | scalar | 0.333531 | 0.090960   | 3.67x   | -72.73%        |
| 2560x1440 | uint8   | 4        | 0.50    | soft_light    | sse42  | 0.333531 | 0.012266   | 27.19x  | -96.32%        |
| 2560x1440 | uint8   | 4        | 0.50    | soft_light    | avx2   | 0.333531 | 0.010974   | 30.39x  | -96.71%        |
| 2560x1440 | uint8   | 4        | 0.50    | lighten_only  | scalar | 0.232726 | 0.096565   | 2.41x   | -58.51%        |
| 2560x1440 | uint8   | 4        | 0.50    | lighten_only  | sse42  | 0.232726 | 0.010550   | 22.06x  | -95.47%        |
| 2560x1440 | uint8   | 4        | 0.50    | lighten_only  | avx2   | 0.232726 | 0.010559   | 22.04x  | -95.46%        |
| 2560x1440 | uint8   | 4        | 0.50    | screen        | scalar | 0.247209 | 0.086611   | 2.85x   | -64.96%        |
| 2560x1440 | uint8   | 4        | 0.50    | screen        | sse42  | 0.247209 | 0.011694   | 21.14x  | -95.27%        |
| 2560x1440 | uint8   | 4        | 0.50    | screen        | avx2   | 0.247209 | 0.010911   | 22.66x  | -95.59%        |
| 2560x1440 | uint8   | 4        | 0.50    | dodge         | scalar | 0.250397 | 0.090592   | 2.76x   | -63.82%        |
| 2560x1440 | uint8   | 4        | 0.50    | dodge         | sse42  | 0.250397 | 0.013206   | 18.96x  | -94.73%        |
| 2560x1440 | uint8   | 4        | 0.50    | dodge         | avx2   | 0.250397 | 0.010763   | 23.27x  | -95.70%        |
| 2560x1440 | uint8   | 4        | 0.50    | addition      | scalar | 0.239138 | 0.108756   | 2.20x   | -54.52%        |
| 2560x1440 | uint8   | 4        | 0.50    | addition      | sse42  | 0.239138 | 0.014134   | 16.92x  | -94.09%        |
| 2560x1440 | uint8   | 4        | 0.50    | addition      | avx2   | 0.239138 | 0.011240   | 21.27x  | -95.30%        |
| 2560x1440 | uint8   | 4        | 0.50    | darken_only   | scalar | 0.232203 | 0.098629   | 2.35x   | -57.52%        |
| 2560x1440 | uint8   | 4        | 0.50    | darken_only   | sse42  | 0.232203 | 0.010633   | 21.84x  | -95.42%        |
| 2560x1440 | uint8   | 4        | 0.50    | darken_only   | avx2   | 0.232203 | 0.010616   | 21.87x  | -95.43%        |
| 2560x1440 | uint8   | 4        | 0.50    | multiply      | scalar | 0.237409 | 0.087414   | 2.72x   | -63.18%        |
| 2560x1440 | uint8   | 4        | 0.50    | multiply      | sse42  | 0.237409 | 0.010938   | 21.70x  | -95.39%        |
| 2560x1440 | uint8   | 4        | 0.50    | multiply      | avx2   | 0.237409 | 0.010468   | 22.68x  | -95.59%        |
| 2560x1440 | uint8   | 4        | 0.50    | hard_light    | scalar | 0.376283 | 0.143979   | 2.61x   | -61.74%        |
| 2560x1440 | uint8   | 4        | 0.50    | hard_light    | sse42  | 0.376283 | 0.013224   | 28.46x  | -96.49%        |
| 2560x1440 | uint8   | 4        | 0.50    | hard_light    | avx2   | 0.376283 | 0.010923   | 34.45x  | -97.10%        |
| 2560x1440 | uint8   | 4        | 0.50    | difference    | scalar | 0.336579 | 0.086170   | 3.91x   | -74.40%        |
| 2560x1440 | uint8   | 4        | 0.50    | difference    | sse42  | 0.336579 | 0.011267   | 29.87x  | -96.65%        |
| 2560x1440 | uint8   | 4        | 0.50    | difference    | avx2   | 0.336579 | 0.010675   | 31.53x  | -96.83%        |
| 2560x1440 | uint8   | 4        | 0.50    | subtract      | scalar | 0.238894 | 0.083403   | 2.86x   | -65.09%        |
| 2560x1440 | uint8   | 4        | 0.50    | subtract      | sse42  | 0.238894 | 0.014654   | 16.30x  | -93.87%        |
| 2560x1440 | uint8   | 4        | 0.50    | subtract      | avx2   | 0.238894 | 0.011126   | 21.47x  | -95.34%        |
| 2560x1440 | uint8   | 4        | 0.50    | grain_extract | scalar | 0.246146 | 0.105096   | 2.34x   | -57.30%        |
| 2560x1440 | uint8   | 4        | 0.50    | grain_extract | sse42  | 0.246146 | 0.011754   | 20.94x  | -95.22%        |
| 2560x1440 | uint8   | 4        | 0.50    | grain_extract | avx2   | 0.246146 | 0.010791   | 22.81x  | -95.62%        |
| 2560x1440 | uint8   | 4        | 0.50    | grain_merge   | scalar | 0.245970 | 0.105346   | 2.33x   | -57.17%        |
| 2560x1440 | uint8   | 4        | 0.50    | grain_merge   | sse42  | 0.245970 | 0.011782   | 20.88x  | -95.21%        |
| 2560x1440 | uint8   | 4        | 0.50    | grain_merge   | avx2   | 0.245970 | 0.010655   | 23.09x  | -95.67%        |
| 2560x1440 | uint8   | 4        | 0.50    | divide        | scalar | 0.252140 | 0.088711   | 2.84x   | -64.82%        |
| 2560x1440 | uint8   | 4        | 0.50    | divide        | sse42  | 0.252140 | 0.012109   | 20.82x  | -95.20%        |
| 2560x1440 | uint8   | 4        | 0.50    | divide        | avx2   | 0.252140 | 0.010599   | 23.79x  | -95.80%        |
| 2560x1440 | uint8   | 4        | 0.50    | overlay       | scalar | 0.348320 | 0.140391   | 2.48x   | -59.69%        |
| 2560x1440 | uint8   | 4        | 0.50    | overlay       | sse42  | 0.348320 | 0.012579   | 27.69x  | -96.39%        |
| 2560x1440 | uint8   | 4        | 0.50    | overlay       | avx2   | 0.348320 | 0.010836   | 32.15x  | -96.89%        |
| 2560x1440 | float32 | 3        | 0.50    | normal        | scalar | 0.280736 | 0.027865   | 10.07x  | -90.07%        |
| 2560x1440 | float32 | 3        | 0.50    | normal        | sse42  | 0.280736 | 0.012485   | 22.49x  | -95.55%        |
| 2560x1440 | float32 | 3        | 0.50    | normal        | avx2   | 0.280736 | 0.008320   | 33.74x  | -97.04%        |
| 2560x1440 | float32 | 3        | 0.50    | soft_light    | scalar | 0.375551 | 0.034381   | 10.92x  | -90.85%        |
| 2560x1440 | float32 | 3        | 0.50    | soft_light    | sse42  | 0.375551 | 0.014349   | 26.17x  | -96.18%        |
| 2560x1440 | float32 | 3        | 0.50    | soft_light    | avx2   | 0.375551 | 0.010380   | 36.18x  | -97.24%        |
| 2560x1440 | float32 | 3        | 0.50    | lighten_only  | scalar | 0.276306 | 0.040972   | 6.74x   | -85.17%        |
| 2560x1440 | float32 | 3        | 0.50    | lighten_only  | sse42  | 0.276306 | 0.011946   | 23.13x  | -95.68%        |
| 2560x1440 | float32 | 3        | 0.50    | lighten_only  | avx2   | 0.276306 | 0.010100   | 27.36x  | -96.34%        |
| 2560x1440 | float32 | 3        | 0.50    | screen        | scalar | 0.296159 | 0.031076   | 9.53x   | -89.51%        |
| 2560x1440 | float32 | 3        | 0.50    | screen        | sse42  | 0.296159 | 0.013230   | 22.39x  | -95.53%        |
| 2560x1440 | float32 | 3        | 0.50    | screen        | avx2   | 0.296159 | 0.010403   | 28.47x  | -96.49%        |
| 2560x1440 | float32 | 3        | 0.50    | dodge         | scalar | 0.294769 | 0.035928   | 8.20x   | -87.81%        |
| 2560x1440 | float32 | 3        | 0.50    | dodge         | sse42  | 0.294769 | 0.015222   | 19.36x  | -94.84%        |
| 2560x1440 | float32 | 3        | 0.50    | dodge         | avx2   | 0.294769 | 0.010749   | 27.42x  | -96.35%        |
| 2560x1440 | float32 | 3        | 0.50    | addition      | scalar | 0.284695 | 0.087974   | 3.24x   | -69.10%        |
| 2560x1440 | float32 | 3        | 0.50    | addition      | sse42  | 0.284695 | 0.012782   | 22.27x  | -95.51%        |
| 2560x1440 | float32 | 3        | 0.50    | addition      | avx2   | 0.284695 | 0.010569   | 26.94x  | -96.29%        |
| 2560x1440 | float32 | 3        | 0.50    | darken_only   | scalar | 0.276454 | 0.041037   | 6.74x   | -85.16%        |
| 2560x1440 | float32 | 3        | 0.50    | darken_only   | sse42  | 0.276454 | 0.011985   | 23.07x  | -95.66%        |
| 2560x1440 | float32 | 3        | 0.50    | darken_only   | avx2   | 0.276454 | 0.010124   | 27.31x  | -96.34%        |
| 2560x1440 | float32 | 3        | 0.50    | multiply      | scalar | 0.284074 | 0.030556   | 9.30x   | -89.24%        |
| 2560x1440 | float32 | 3        | 0.50    | multiply      | sse42  | 0.284074 | 0.012331   | 23.04x  | -95.66%        |
| 2560x1440 | float32 | 3        | 0.50    | multiply      | avx2   | 0.284074 | 0.010088   | 28.16x  | -96.45%        |
| 2560x1440 | float32 | 3        | 0.50    | hard_light    | scalar | 0.423613 | 0.100167   | 4.23x   | -76.35%        |
| 2560x1440 | float32 | 3        | 0.50    | hard_light    | sse42  | 0.423613 | 0.015197   | 27.88x  | -96.41%        |
| 2560x1440 | float32 | 3        | 0.50    | hard_light    | avx2   | 0.423613 | 0.010559   | 40.12x  | -97.51%        |
| 2560x1440 | float32 | 3        | 0.50    | difference    | scalar | 0.379177 | 0.030822   | 12.30x  | -91.87%        |
| 2560x1440 | float32 | 3        | 0.50    | difference    | sse42  | 0.379177 | 0.012534   | 30.25x  | -96.69%        |
| 2560x1440 | float32 | 3        | 0.50    | difference    | avx2   | 0.379177 | 0.010298   | 36.82x  | -97.28%        |
| 2560x1440 | float32 | 3        | 0.50    | subtract      | scalar | 0.284956 | 0.038763   | 7.35x   | -86.40%        |
| 2560x1440 | float32 | 3        | 0.50    | subtract      | sse42  | 0.284956 | 0.013250   | 21.51x  | -95.35%        |
| 2560x1440 | float32 | 3        | 0.50    | subtract      | avx2   | 0.284956 | 0.010532   | 27.06x  | -96.30%        |
| 2560x1440 | float32 | 3        | 0.50    | grain_extract | scalar | 0.292747 | 0.056742   | 5.16x   | -80.62%        |
| 2560x1440 | float32 | 3        | 0.50    | grain_extract | sse42  | 0.292747 | 0.013300   | 22.01x  | -95.46%        |
| 2560x1440 | float32 | 3        | 0.50    | grain_extract | avx2   | 0.292747 | 0.010460   | 27.99x  | -96.43%        |
| 2560x1440 | float32 | 3        | 0.50    | grain_merge   | scalar | 0.290353 | 0.056622   | 5.13x   | -80.50%        |
| 2560x1440 | float32 | 3        | 0.50    | grain_merge   | sse42  | 0.290353 | 0.013257   | 21.90x  | -95.43%        |
| 2560x1440 | float32 | 3        | 0.50    | grain_merge   | avx2   | 0.290353 | 0.010385   | 27.96x  | -96.42%        |
| 2560x1440 | float32 | 3        | 0.50    | divide        | scalar | 0.299406 | 0.034359   | 8.71x   | -88.52%        |
| 2560x1440 | float32 | 3        | 0.50    | divide        | sse42  | 0.299406 | 0.014718   | 20.34x  | -95.08%        |
| 2560x1440 | float32 | 3        | 0.50    | divide        | avx2   | 0.299406 | 0.010469   | 28.60x  | -96.50%        |
| 2560x1440 | float32 | 3        | 0.50    | overlay       | scalar | 0.392206 | 0.091805   | 4.27x   | -76.59%        |
| 2560x1440 | float32 | 3        | 0.50    | overlay       | sse42  | 0.392206 | 0.014359   | 27.31x  | -96.34%        |
| 2560x1440 | float32 | 3        | 0.50    | overlay       | avx2   | 0.392206 | 0.010407   | 37.69x  | -97.35%        |
| 2560x1440 | float32 | 4        | 0.50    | normal        | scalar | 0.224276 | 0.040151   | 5.59x   | -82.10%        |
| 2560x1440 | float32 | 4        | 0.50    | normal        | sse42  | 0.224276 | 0.014580   | 15.38x  | -93.50%        |
| 2560x1440 | float32 | 4        | 0.50    | normal        | avx2   | 0.224276 | 0.015730   | 14.26x  | -92.99%        |
| 2560x1440 | float32 | 4        | 0.50    | soft_light    | scalar | 0.322657 | 0.048230   | 6.69x   | -85.05%        |
| 2560x1440 | float32 | 4        | 0.50    | soft_light    | sse42  | 0.322657 | 0.016963   | 19.02x  | -94.74%        |
| 2560x1440 | float32 | 4        | 0.50    | soft_light    | avx2   | 0.322657 | 0.016261   | 19.84x  | -94.96%        |
| 2560x1440 | float32 | 4        | 0.50    | lighten_only  | scalar | 0.221026 | 0.049422   | 4.47x   | -77.64%        |
| 2560x1440 | float32 | 4        | 0.50    | lighten_only  | sse42  | 0.221026 | 0.015336   | 14.41x  | -93.06%        |
| 2560x1440 | float32 | 4        | 0.50    | lighten_only  | avx2   | 0.221026 | 0.016106   | 13.72x  | -92.71%        |
| 2560x1440 | float32 | 4        | 0.50    | screen        | scalar | 0.237798 | 0.044213   | 5.38x   | -81.41%        |
| 2560x1440 | float32 | 4        | 0.50    | screen        | sse42  | 0.237798 | 0.015688   | 15.16x  | -93.40%        |
| 2560x1440 | float32 | 4        | 0.50    | screen        | avx2   | 0.237798 | 0.015928   | 14.93x  | -93.30%        |
| 2560x1440 | float32 | 4        | 0.50    | dodge         | scalar | 0.268932 | 0.049702   | 5.41x   | -81.52%        |
| 2560x1440 | float32 | 4        | 0.50    | dodge         | sse42  | 0.268932 | 0.018350   | 14.66x  | -93.18%        |
| 2560x1440 | float32 | 4        | 0.50    | dodge         | avx2   | 0.268932 | 0.017033   | 15.79x  | -93.67%        |
| 2560x1440 | float32 | 4        | 0.50    | addition      | scalar | 0.237844 | 0.081400   | 2.92x   | -65.78%        |
| 2560x1440 | float32 | 4        | 0.50    | addition      | sse42  | 0.237844 | 0.019054   | 12.48x  | -91.99%        |
| 2560x1440 | float32 | 4        | 0.50    | addition      | avx2   | 0.237844 | 0.021809   | 10.91x  | -90.83%        |
| 2560x1440 | float32 | 4        | 0.50    | darken_only   | scalar | 0.226804 | 0.051275   | 4.42x   | -77.39%        |
| 2560x1440 | float32 | 4        | 0.50    | darken_only   | sse42  | 0.226804 | 0.016168   | 14.03x  | -92.87%        |
| 2560x1440 | float32 | 4        | 0.50    | darken_only   | avx2   | 0.226804 | 0.016084   | 14.10x  | -92.91%        |
| 2560x1440 | float32 | 4        | 0.50    | multiply      | scalar | 0.229089 | 0.042009   | 5.45x   | -81.66%        |
| 2560x1440 | float32 | 4        | 0.50    | multiply      | sse42  | 0.229089 | 0.015354   | 14.92x  | -93.30%        |
| 2560x1440 | float32 | 4        | 0.50    | multiply      | avx2   | 0.229089 | 0.015995   | 14.32x  | -93.02%        |
| 2560x1440 | float32 | 4        | 0.50    | hard_light    | scalar | 0.363251 | 0.110490   | 3.29x   | -69.58%        |
| 2560x1440 | float32 | 4        | 0.50    | hard_light    | sse42  | 0.363251 | 0.018170   | 19.99x  | -95.00%        |
| 2560x1440 | float32 | 4        | 0.50    | hard_light    | avx2   | 0.363251 | 0.015943   | 22.78x  | -95.61%        |
| 2560x1440 | float32 | 4        | 0.50    | difference    | scalar | 0.325529 | 0.042336   | 7.69x   | -86.99%        |
| 2560x1440 | float32 | 4        | 0.50    | difference    | sse42  | 0.325529 | 0.015541   | 20.95x  | -95.23%        |
| 2560x1440 | float32 | 4        | 0.50    | difference    | avx2   | 0.325529 | 0.016062   | 20.27x  | -95.07%        |
| 2560x1440 | float32 | 4        | 0.50    | subtract      | scalar | 0.224633 | 0.054275   | 4.14x   | -75.84%        |
| 2560x1440 | float32 | 4        | 0.50    | subtract      | sse42  | 0.224633 | 0.016516   | 13.60x  | -92.65%        |
| 2560x1440 | float32 | 4        | 0.50    | subtract      | avx2   | 0.224633 | 0.016502   | 13.61x  | -92.65%        |
| 2560x1440 | float32 | 4        | 0.50    | grain_extract | scalar | 0.236799 | 0.065266   | 3.63x   | -72.44%        |
| 2560x1440 | float32 | 4        | 0.50    | grain_extract | sse42  | 0.236799 | 0.016232   | 14.59x  | -93.15%        |
| 2560x1440 | float32 | 4        | 0.50    | grain_extract | avx2   | 0.236799 | 0.016093   | 14.71x  | -93.20%        |
| 2560x1440 | float32 | 4        | 0.50    | grain_merge   | scalar | 0.235364 | 0.065202   | 3.61x   | -72.30%        |
| 2560x1440 | float32 | 4        | 0.50    | grain_merge   | sse42  | 0.235364 | 0.015711   | 14.98x  | -93.32%        |
| 2560x1440 | float32 | 4        | 0.50    | grain_merge   | avx2   | 0.235364 | 0.015914   | 14.79x  | -93.24%        |
| 2560x1440 | float32 | 4        | 0.50    | divide        | scalar | 0.240367 | 0.045754   | 5.25x   | -80.96%        |
| 2560x1440 | float32 | 4        | 0.50    | divide        | sse42  | 0.240367 | 0.015866   | 15.15x  | -93.40%        |
| 2560x1440 | float32 | 4        | 0.50    | divide        | avx2   | 0.240367 | 0.016496   | 14.57x  | -93.14%        |
| 2560x1440 | float32 | 4        | 0.50    | overlay       | scalar | 0.335596 | 0.102591   | 3.27x   | -69.43%        |
| 2560x1440 | float32 | 4        | 0.50    | overlay       | sse42  | 0.335596 | 0.016241   | 20.66x  | -95.16%        |
| 2560x1440 | float32 | 4        | 0.50    | overlay       | avx2   | 0.335596 | 0.016202   | 20.71x  | -95.17%        |
| 3840x2160 | uint8   | 3        | 0.50    | normal        | scalar | 0.709083 | 0.203875   | 3.48x   | -71.25%        |
| 3840x2160 | uint8   | 3        | 0.50    | normal        | sse42  | 0.709083 | 0.086418   | 8.21x   | -87.81%        |
| 3840x2160 | uint8   | 3        | 0.50    | normal        | avx2   | 0.709083 | 0.087814   | 8.07x   | -87.62%        |
| 3840x2160 | uint8   | 3        | 0.50    | soft_light    | scalar | 0.941015 | 0.225982   | 4.16x   | -75.99%        |
| 3840x2160 | uint8   | 3        | 0.50    | soft_light    | sse42  | 0.941015 | 0.107721   | 8.74x   | -88.55%        |
| 3840x2160 | uint8   | 3        | 0.50    | soft_light    | avx2   | 0.941015 | 0.100447   | 9.37x   | -89.33%        |
| 3840x2160 | uint8   | 3        | 0.50    | lighten_only  | scalar | 0.697293 | 0.241476   | 2.89x   | -65.37%        |
| 3840x2160 | uint8   | 3        | 0.50    | lighten_only  | sse42  | 0.697293 | 0.097573   | 7.15x   | -86.01%        |
| 3840x2160 | uint8   | 3        | 0.50    | lighten_only  | avx2   | 0.697293 | 0.094437   | 7.38x   | -86.46%        |
| 3840x2160 | uint8   | 3        | 0.50    | screen        | scalar | 0.732962 | 0.219430   | 3.34x   | -70.06%        |
| 3840x2160 | uint8   | 3        | 0.50    | screen        | sse42  | 0.732962 | 0.100952   | 7.26x   | -86.23%        |
| 3840x2160 | uint8   | 3        | 0.50    | screen        | avx2   | 0.732962 | 0.096547   | 7.59x   | -86.83%        |
| 3840x2160 | uint8   | 3        | 0.50    | dodge         | scalar | 0.734328 | 0.226346   | 3.24x   | -69.18%        |
| 3840x2160 | uint8   | 3        | 0.50    | dodge         | sse42  | 0.734328 | 0.110054   | 6.67x   | -85.01%        |
| 3840x2160 | uint8   | 3        | 0.50    | dodge         | avx2   | 0.734328 | 0.101827   | 7.21x   | -86.13%        |
| 3840x2160 | uint8   | 3        | 0.50    | addition      | scalar | 0.713763 | 0.316895   | 2.25x   | -55.60%        |
| 3840x2160 | uint8   | 3        | 0.50    | addition      | sse42  | 0.713763 | 0.099800   | 7.15x   | -86.02%        |
| 3840x2160 | uint8   | 3        | 0.50    | addition      | avx2   | 0.713763 | 0.095190   | 7.50x   | -86.66%        |
| 3840x2160 | uint8   | 3        | 0.50    | darken_only   | scalar | 0.698627 | 0.243302   | 2.87x   | -65.17%        |
| 3840x2160 | uint8   | 3        | 0.50    | darken_only   | sse42  | 0.698627 | 0.097579   | 7.16x   | -86.03%        |
| 3840x2160 | uint8   | 3        | 0.50    | darken_only   | avx2   | 0.698627 | 0.094407   | 7.40x   | -86.49%        |
| 3840x2160 | uint8   | 3        | 0.50    | multiply      | scalar | 0.714200 | 0.219599   | 3.25x   | -69.25%        |
| 3840x2160 | uint8   | 3        | 0.50    | multiply      | sse42  | 0.714200 | 0.098472   | 7.25x   | -86.21%        |
| 3840x2160 | uint8   | 3        | 0.50    | multiply      | avx2   | 0.714200 | 0.095518   | 7.48x   | -86.63%        |
| 3840x2160 | uint8   | 3        | 0.50    | hard_light    | scalar | 1.010458 | 0.372801   | 2.71x   | -63.11%        |
| 3840x2160 | uint8   | 3        | 0.50    | hard_light    | sse42  | 1.010458 | 0.110697   | 9.13x   | -89.04%        |
| 3840x2160 | uint8   | 3        | 0.50    | hard_light    | avx2   | 1.010458 | 0.101248   | 9.98x   | -89.98%        |
| 3840x2160 | uint8   | 3        | 0.50    | difference    | scalar | 0.943463 | 0.219043   | 4.31x   | -76.78%        |
| 3840x2160 | uint8   | 3        | 0.50    | difference    | sse42  | 0.943463 | 0.097685   | 9.66x   | -89.65%        |
| 3840x2160 | uint8   | 3        | 0.50    | difference    | avx2   | 0.943463 | 0.094348   | 10.00x  | -90.00%        |
| 3840x2160 | uint8   | 3        | 0.50    | subtract      | scalar | 0.713700 | 0.201307   | 3.55x   | -71.79%        |
| 3840x2160 | uint8   | 3        | 0.50    | subtract      | sse42  | 0.713700 | 0.106519   | 6.70x   | -85.08%        |
| 3840x2160 | uint8   | 3        | 0.50    | subtract      | avx2   | 0.713700 | 0.098488   | 7.25x   | -86.20%        |
| 3840x2160 | uint8   | 3        | 0.50    | grain_extract | scalar | 0.731294 | 0.264525   | 2.76x   | -63.83%        |
| 3840x2160 | uint8   | 3        | 0.50    | grain_extract | sse42  | 0.731294 | 0.105547   | 6.93x   | -85.57%        |
| 3840x2160 | uint8   | 3        | 0.50    | grain_extract | avx2   | 0.731294 | 0.098679   | 7.41x   | -86.51%        |
| 3840x2160 | uint8   | 3        | 0.50    | grain_merge   | scalar | 0.731032 | 0.265399   | 2.75x   | -63.70%        |
| 3840x2160 | uint8   | 3        | 0.50    | grain_merge   | sse42  | 0.731032 | 0.106474   | 6.87x   | -85.44%        |
| 3840x2160 | uint8   | 3        | 0.50    | grain_merge   | avx2   | 0.731032 | 0.100975   | 7.24x   | -86.19%        |
| 3840x2160 | uint8   | 3        | 0.50    | divide        | scalar | 0.745601 | 0.222013   | 3.36x   | -70.22%        |
| 3840x2160 | uint8   | 3        | 0.50    | divide        | sse42  | 0.745601 | 0.111496   | 6.69x   | -85.05%        |
| 3840x2160 | uint8   | 3        | 0.50    | divide        | avx2   | 0.745601 | 0.100567   | 7.41x   | -86.51%        |
| 3840x2160 | uint8   | 3        | 0.50    | overlay       | scalar | 0.947464 | 0.365498   | 2.59x   | -61.42%        |
| 3840x2160 | uint8   | 3        | 0.50    | overlay       | sse42  | 0.947464 | 0.108108   | 8.76x   | -88.59%        |
| 3840x2160 | uint8   | 3        | 0.50    | overlay       | avx2   | 0.947464 | 0.100328   | 9.44x   | -89.41%        |
| 3840x2160 | uint8   | 4        | 0.50    | normal        | scalar | 0.519559 | 0.164467   | 3.16x   | -68.34%        |
| 3840x2160 | uint8   | 4        | 0.50    | normal        | sse42  | 0.519559 | 0.021531   | 24.13x  | -95.86%        |
| 3840x2160 | uint8   | 4        | 0.50    | normal        | avx2   | 0.519559 | 0.019689   | 26.39x  | -96.21%        |
| 3840x2160 | uint8   | 4        | 0.50    | soft_light    | scalar | 0.729378 | 0.206068   | 3.54x   | -71.75%        |
| 3840x2160 | uint8   | 4        | 0.50    | soft_light    | sse42  | 0.729378 | 0.027568   | 26.46x  | -96.22%        |
| 3840x2160 | uint8   | 4        | 0.50    | soft_light    | avx2   | 0.729378 | 0.024460   | 29.82x  | -96.65%        |
| 3840x2160 | uint8   | 4        | 0.50    | lighten_only  | scalar | 0.513208 | 0.217009   | 2.36x   | -57.72%        |
| 3840x2160 | uint8   | 4        | 0.50    | lighten_only  | sse42  | 0.513208 | 0.023744   | 21.61x  | -95.37%        |
| 3840x2160 | uint8   | 4        | 0.50    | lighten_only  | avx2   | 0.513208 | 0.023745   | 21.61x  | -95.37%        |
| 3840x2160 | uint8   | 4        | 0.50    | screen        | scalar | 0.550269 | 0.196555   | 2.80x   | -64.28%        |
| 3840x2160 | uint8   | 4        | 0.50    | screen        | sse42  | 0.550269 | 0.025998   | 21.17x  | -95.28%        |
| 3840x2160 | uint8   | 4        | 0.50    | screen        | avx2   | 0.550269 | 0.024480   | 22.48x  | -95.55%        |
| 3840x2160 | uint8   | 4        | 0.50    | dodge         | scalar | 0.550685 | 0.203741   | 2.70x   | -63.00%        |
| 3840x2160 | uint8   | 4        | 0.50    | dodge         | sse42  | 0.550685 | 0.029214   | 18.85x  | -94.69%        |
| 3840x2160 | uint8   | 4        | 0.50    | dodge         | avx2   | 0.550685 | 0.024014   | 22.93x  | -95.64%        |
| 3840x2160 | uint8   | 4        | 0.50    | addition      | scalar | 0.526750 | 0.245244   | 2.15x   | -53.44%        |
| 3840x2160 | uint8   | 4        | 0.50    | addition      | sse42  | 0.526750 | 0.031871   | 16.53x  | -93.95%        |
| 3840x2160 | uint8   | 4        | 0.50    | addition      | avx2   | 0.526750 | 0.024888   | 21.16x  | -95.28%        |
| 3840x2160 | uint8   | 4        | 0.50    | darken_only   | scalar | 0.516854 | 0.219431   | 2.36x   | -57.54%        |
| 3840x2160 | uint8   | 4        | 0.50    | darken_only   | sse42  | 0.516854 | 0.023679   | 21.83x  | -95.42%        |
| 3840x2160 | uint8   | 4        | 0.50    | darken_only   | avx2   | 0.516854 | 0.023747   | 21.77x  | -95.41%        |
| 3840x2160 | uint8   | 4        | 0.50    | multiply      | scalar | 0.527983 | 0.197600   | 2.67x   | -62.57%        |
| 3840x2160 | uint8   | 4        | 0.50    | multiply      | sse42  | 0.527983 | 0.024473   | 21.57x  | -95.36%        |
| 3840x2160 | uint8   | 4        | 0.50    | multiply      | avx2   | 0.527983 | 0.023500   | 22.47x  | -95.55%        |
| 3840x2160 | uint8   | 4        | 0.50    | hard_light    | scalar | 0.824746 | 0.325212   | 2.54x   | -60.57%        |
| 3840x2160 | uint8   | 4        | 0.50    | hard_light    | sse42  | 0.824746 | 0.029619   | 27.84x  | -96.41%        |
| 3840x2160 | uint8   | 4        | 0.50    | hard_light    | avx2   | 0.824746 | 0.024442   | 33.74x  | -97.04%        |
| 3840x2160 | uint8   | 4        | 0.50    | difference    | scalar | 0.743171 | 0.196072   | 3.79x   | -73.62%        |
| 3840x2160 | uint8   | 4        | 0.50    | difference    | sse42  | 0.743171 | 0.024231   | 30.67x  | -96.74%        |
| 3840x2160 | uint8   | 4        | 0.50    | difference    | avx2   | 0.743171 | 0.023756   | 31.28x  | -96.80%        |
| 3840x2160 | uint8   | 4        | 0.50    | subtract      | scalar | 0.525804 | 0.188805   | 2.78x   | -64.09%        |
| 3840x2160 | uint8   | 4        | 0.50    | subtract      | sse42  | 0.525804 | 0.033139   | 15.87x  | -93.70%        |
| 3840x2160 | uint8   | 4        | 0.50    | subtract      | avx2   | 0.525804 | 0.024922   | 21.10x  | -95.26%        |
| 3840x2160 | uint8   | 4        | 0.50    | grain_extract | scalar | 0.543327 | 0.237283   | 2.29x   | -56.33%        |
| 3840x2160 | uint8   | 4        | 0.50    | grain_extract | sse42  | 0.543327 | 0.026458   | 20.54x  | -95.13%        |
| 3840x2160 | uint8   | 4        | 0.50    | grain_extract | avx2   | 0.543327 | 0.024059   | 22.58x  | -95.57%        |
| 3840x2160 | uint8   | 4        | 0.50    | grain_merge   | scalar | 0.543895 | 0.237303   | 2.29x   | -56.37%        |
| 3840x2160 | uint8   | 4        | 0.50    | grain_merge   | sse42  | 0.543895 | 0.026359   | 20.63x  | -95.15%        |
| 3840x2160 | uint8   | 4        | 0.50    | grain_merge   | avx2   | 0.543895 | 0.024036   | 22.63x  | -95.58%        |
| 3840x2160 | uint8   | 4        | 0.50    | divide        | scalar | 0.555767 | 0.199311   | 2.79x   | -64.14%        |
| 3840x2160 | uint8   | 4        | 0.50    | divide        | sse42  | 0.555767 | 0.027009   | 20.58x  | -95.14%        |
| 3840x2160 | uint8   | 4        | 0.50    | divide        | avx2   | 0.555767 | 0.023702   | 23.45x  | -95.74%        |
| 3840x2160 | uint8   | 4        | 0.50    | overlay       | scalar | 0.761599 | 0.316015   | 2.41x   | -58.51%        |
| 3840x2160 | uint8   | 4        | 0.50    | overlay       | sse42  | 0.761599 | 0.028343   | 26.87x  | -96.28%        |
| 3840x2160 | uint8   | 4        | 0.50    | overlay       | avx2   | 0.761599 | 0.024316   | 31.32x  | -96.81%        |
| 3840x2160 | float32 | 3        | 0.50    | normal        | scalar | 0.612312 | 0.070997   | 8.62x   | -88.41%        |
| 3840x2160 | float32 | 3        | 0.50    | normal        | sse42  | 0.612312 | 0.036957   | 16.57x  | -93.96%        |
| 3840x2160 | float32 | 3        | 0.50    | normal        | avx2   | 0.612312 | 0.027122   | 22.58x  | -95.57%        |
| 3840x2160 | float32 | 3        | 0.50    | soft_light    | scalar | 0.817177 | 0.086589   | 9.44x   | -89.40%        |
| 3840x2160 | float32 | 3        | 0.50    | soft_light    | sse42  | 0.817177 | 0.041461   | 19.71x  | -94.93%        |
| 3840x2160 | float32 | 3        | 0.50    | soft_light    | avx2   | 0.817177 | 0.032176   | 25.40x  | -96.06%        |
| 3840x2160 | float32 | 3        | 0.50    | lighten_only  | scalar | 0.599570 | 0.101667   | 5.90x   | -83.04%        |
| 3840x2160 | float32 | 3        | 0.50    | lighten_only  | sse42  | 0.599570 | 0.036166   | 16.58x  | -93.97%        |
| 3840x2160 | float32 | 3        | 0.50    | lighten_only  | avx2   | 0.599570 | 0.031126   | 19.26x  | -94.81%        |
| 3840x2160 | float32 | 3        | 0.50    | screen        | scalar | 0.636353 | 0.079052   | 8.05x   | -87.58%        |
| 3840x2160 | float32 | 3        | 0.50    | screen        | sse42  | 0.636353 | 0.038370   | 16.58x  | -93.97%        |
| 3840x2160 | float32 | 3        | 0.50    | screen        | avx2   | 0.636353 | 0.032137   | 19.80x  | -94.95%        |
| 3840x2160 | float32 | 3        | 0.50    | dodge         | scalar | 0.639829 | 0.088934   | 7.19x   | -86.10%        |
| 3840x2160 | float32 | 3        | 0.50    | dodge         | sse42  | 0.639829 | 0.042996   | 14.88x  | -93.28%        |
| 3840x2160 | float32 | 3        | 0.50    | dodge         | avx2   | 0.639829 | 0.032196   | 19.87x  | -94.97%        |
| 3840x2160 | float32 | 3        | 0.50    | addition      | scalar | 0.618517 | 0.205120   | 3.02x   | -66.84%        |
| 3840x2160 | float32 | 3        | 0.50    | addition      | sse42  | 0.618517 | 0.037745   | 16.39x  | -93.90%        |
| 3840x2160 | float32 | 3        | 0.50    | addition      | avx2   | 0.618517 | 0.032546   | 19.00x  | -94.74%        |
| 3840x2160 | float32 | 3        | 0.50    | darken_only   | scalar | 0.599404 | 0.100409   | 5.97x   | -83.25%        |
| 3840x2160 | float32 | 3        | 0.50    | darken_only   | sse42  | 0.599404 | 0.035412   | 16.93x  | -94.09%        |
| 3840x2160 | float32 | 3        | 0.50    | darken_only   | avx2   | 0.599404 | 0.030989   | 19.34x  | -94.83%        |
| 3840x2160 | float32 | 3        | 0.50    | multiply      | scalar | 0.615415 | 0.077085   | 7.98x   | -87.47%        |
| 3840x2160 | float32 | 3        | 0.50    | multiply      | sse42  | 0.615415 | 0.036077   | 17.06x  | -94.14%        |
| 3840x2160 | float32 | 3        | 0.50    | multiply      | avx2   | 0.615415 | 0.030643   | 20.08x  | -95.02%        |
| 3840x2160 | float32 | 3        | 0.50    | hard_light    | scalar | 0.916496 | 0.232678   | 3.94x   | -74.61%        |
| 3840x2160 | float32 | 3        | 0.50    | hard_light    | sse42  | 0.916496 | 0.042771   | 21.43x  | -95.33%        |
| 3840x2160 | float32 | 3        | 0.50    | hard_light    | avx2   | 0.916496 | 0.032003   | 28.64x  | -96.51%        |
| 3840x2160 | float32 | 3        | 0.50    | difference    | scalar | 0.829396 | 0.076793   | 10.80x  | -90.74%        |
| 3840x2160 | float32 | 3        | 0.50    | difference    | sse42  | 0.829396 | 0.036806   | 22.53x  | -95.56%        |
| 3840x2160 | float32 | 3        | 0.50    | difference    | avx2   | 0.829396 | 0.031384   | 26.43x  | -96.22%        |
| 3840x2160 | float32 | 3        | 0.50    | subtract      | scalar | 0.616042 | 0.094606   | 6.51x   | -84.64%        |
| 3840x2160 | float32 | 3        | 0.50    | subtract      | sse42  | 0.616042 | 0.038332   | 16.07x  | -93.78%        |
| 3840x2160 | float32 | 3        | 0.50    | subtract      | avx2   | 0.616042 | 0.031889   | 19.32x  | -94.82%        |
| 3840x2160 | float32 | 3        | 0.50    | grain_extract | scalar | 0.629526 | 0.134487   | 4.68x   | -78.64%        |
| 3840x2160 | float32 | 3        | 0.50    | grain_extract | sse42  | 0.629526 | 0.038556   | 16.33x  | -93.88%        |
| 3840x2160 | float32 | 3        | 0.50    | grain_extract | avx2   | 0.629526 | 0.031934   | 19.71x  | -94.93%        |
| 3840x2160 | float32 | 3        | 0.50    | grain_merge   | scalar | 0.633060 | 0.135176   | 4.68x   | -78.65%        |
| 3840x2160 | float32 | 3        | 0.50    | grain_merge   | sse42  | 0.633060 | 0.038696   | 16.36x  | -93.89%        |
| 3840x2160 | float32 | 3        | 0.50    | grain_merge   | avx2   | 0.633060 | 0.031747   | 19.94x  | -94.99%        |
| 3840x2160 | float32 | 3        | 0.50    | divide        | scalar | 0.644497 | 0.085483   | 7.54x   | -86.74%        |
| 3840x2160 | float32 | 3        | 0.50    | divide        | sse42  | 0.644497 | 0.041092   | 15.68x  | -93.62%        |
| 3840x2160 | float32 | 3        | 0.50    | divide        | avx2   | 0.644497 | 0.031351   | 20.56x  | -95.14%        |
| 3840x2160 | float32 | 3        | 0.50    | overlay       | scalar | 0.840030 | 0.212885   | 3.95x   | -74.66%        |
| 3840x2160 | float32 | 3        | 0.50    | overlay       | sse42  | 0.840030 | 0.040582   | 20.70x  | -95.17%        |
| 3840x2160 | float32 | 3        | 0.50    | overlay       | avx2   | 0.840030 | 0.031497   | 26.67x  | -96.25%        |
| 3840x2160 | float32 | 4        | 0.50    | normal        | scalar | 0.479357 | 0.087061   | 5.51x   | -81.84%        |
| 3840x2160 | float32 | 4        | 0.50    | normal        | sse42  | 0.479357 | 0.030277   | 15.83x  | -93.68%        |
| 3840x2160 | float32 | 4        | 0.50    | normal        | avx2   | 0.479357 | 0.040416   | 11.86x  | -91.57%        |
| 3840x2160 | float32 | 4        | 0.50    | soft_light    | scalar | 0.684756 | 0.100070   | 6.84x   | -85.39%        |
| 3840x2160 | float32 | 4        | 0.50    | soft_light    | sse42  | 0.684756 | 0.033373   | 20.52x  | -95.13%        |
| 3840x2160 | float32 | 4        | 0.50    | soft_light    | avx2   | 0.684756 | 0.034363   | 19.93x  | -94.98%        |
| 3840x2160 | float32 | 4        | 0.50    | lighten_only  | scalar | 0.465821 | 0.106568   | 4.37x   | -77.12%        |
| 3840x2160 | float32 | 4        | 0.50    | lighten_only  | sse42  | 0.465821 | 0.032561   | 14.31x  | -93.01%        |
| 3840x2160 | float32 | 4        | 0.50    | lighten_only  | avx2   | 0.465821 | 0.033476   | 13.92x  | -92.81%        |
| 3840x2160 | float32 | 4        | 0.50    | screen        | scalar | 0.500780 | 0.094903   | 5.28x   | -81.05%        |
| 3840x2160 | float32 | 4        | 0.50    | screen        | sse42  | 0.500780 | 0.033208   | 15.08x  | -93.37%        |
| 3840x2160 | float32 | 4        | 0.50    | screen        | avx2   | 0.500780 | 0.033644   | 14.88x  | -93.28%        |
| 3840x2160 | float32 | 4        | 0.50    | dodge         | scalar | 0.504651 | 0.104703   | 4.82x   | -79.25%        |
| 3840x2160 | float32 | 4        | 0.50    | dodge         | sse42  | 0.504651 | 0.037417   | 13.49x  | -92.59%        |
| 3840x2160 | float32 | 4        | 0.50    | dodge         | avx2   | 0.504651 | 0.033843   | 14.91x  | -93.29%        |
| 3840x2160 | float32 | 4        | 0.50    | addition      | scalar | 0.487424 | 0.178975   | 2.72x   | -63.28%        |
| 3840x2160 | float32 | 4        | 0.50    | addition      | sse42  | 0.487424 | 0.034920   | 13.96x  | -92.84%        |
| 3840x2160 | float32 | 4        | 0.50    | addition      | avx2   | 0.487424 | 0.035007   | 13.92x  | -92.82%        |
| 3840x2160 | float32 | 4        | 0.50    | darken_only   | scalar | 0.468246 | 0.107703   | 4.35x   | -77.00%        |
| 3840x2160 | float32 | 4        | 0.50    | darken_only   | sse42  | 0.468246 | 0.033651   | 13.91x  | -92.81%        |
| 3840x2160 | float32 | 4        | 0.50    | darken_only   | avx2   | 0.468246 | 0.033861   | 13.83x  | -92.77%        |
| 3840x2160 | float32 | 4        | 0.50    | multiply      | scalar | 0.483024 | 0.091757   | 5.26x   | -81.00%        |
| 3840x2160 | float32 | 4        | 0.50    | multiply      | sse42  | 0.483024 | 0.031892   | 15.15x  | -93.40%        |
| 3840x2160 | float32 | 4        | 0.50    | multiply      | avx2   | 0.483024 | 0.033760   | 14.31x  | -93.01%        |
| 3840x2160 | float32 | 4        | 0.50    | hard_light    | scalar | 0.786308 | 0.242696   | 3.24x   | -69.13%        |
| 3840x2160 | float32 | 4        | 0.50    | hard_light    | sse42  | 0.786308 | 0.038761   | 20.29x  | -95.07%        |
| 3840x2160 | float32 | 4        | 0.50    | hard_light    | avx2   | 0.786308 | 0.033834   | 23.24x  | -95.70%        |
| 3840x2160 | float32 | 4        | 0.50    | difference    | scalar | 0.696396 | 0.092195   | 7.55x   | -86.76%        |
| 3840x2160 | float32 | 4        | 0.50    | difference    | sse42  | 0.696396 | 0.033089   | 21.05x  | -95.25%        |
| 3840x2160 | float32 | 4        | 0.50    | difference    | avx2   | 0.696396 | 0.034164   | 20.38x  | -95.09%        |
| 3840x2160 | float32 | 4        | 0.50    | subtract      | scalar | 0.481969 | 0.118633   | 4.06x   | -75.39%        |
| 3840x2160 | float32 | 4        | 0.50    | subtract      | sse42  | 0.481969 | 0.035183   | 13.70x  | -92.70%        |
| 3840x2160 | float32 | 4        | 0.50    | subtract      | avx2   | 0.481969 | 0.034637   | 13.91x  | -92.81%        |
| 3840x2160 | float32 | 4        | 0.50    | grain_extract | scalar | 0.498023 | 0.142622   | 3.49x   | -71.36%        |
| 3840x2160 | float32 | 4        | 0.50    | grain_extract | sse42  | 0.498023 | 0.032448   | 15.35x  | -93.48%        |
| 3840x2160 | float32 | 4        | 0.50    | grain_extract | avx2   | 0.498023 | 0.033433   | 14.90x  | -93.29%        |
| 3840x2160 | float32 | 4        | 0.50    | grain_merge   | scalar | 0.497692 | 0.142563   | 3.49x   | -71.36%        |
| 3840x2160 | float32 | 4        | 0.50    | grain_merge   | sse42  | 0.497692 | 0.032725   | 15.21x  | -93.42%        |
| 3840x2160 | float32 | 4        | 0.50    | grain_merge   | avx2   | 0.497692 | 0.033476   | 14.87x  | -93.27%        |
| 3840x2160 | float32 | 4        | 0.50    | divide        | scalar | 0.511828 | 0.100229   | 5.11x   | -80.42%        |
| 3840x2160 | float32 | 4        | 0.50    | divide        | sse42  | 0.511828 | 0.033761   | 15.16x  | -93.40%        |
| 3840x2160 | float32 | 4        | 0.50    | divide        | avx2   | 0.511828 | 0.033745   | 15.17x  | -93.41%        |
| 3840x2160 | float32 | 4        | 0.50    | overlay       | scalar | 0.713519 | 0.226674   | 3.15x   | -68.23%        |
| 3840x2160 | float32 | 4        | 0.50    | overlay       | sse42  | 0.713519 | 0.034133   | 20.90x  | -95.22%        |
| 3840x2160 | float32 | 4        | 0.50    | overlay       | avx2   | 0.713519 | 0.034005   | 20.98x  | -95.23%        |
</details>
<!-- PERF_RESULTS_END -->
