R refract v0.3.2

gemma-4-26B-A4B-it-Q8_0.gguf

Quantization audit · ctk=q8_0,ctv=turbo4 against fp16 reference
report#0430-1623
generated2026-04-30 16:23:41
scoring0–100, higher is better
Composite score
29.12
/ 100
Fail
AGTM
17.32
Fail
BKLD@D
17.59
Fail
CR-NIAH
100.00
Excellent
DPLAD
78.40
Degraded

Diagnosis

2 findings
!
Material quality loss. Treat as broken.
Composite of 29.12 falls in the Fail band. Surfaces below threshold: Trajectory fail, KLD fail, PLAD degraded.
01

Decode distribution drift detected: the candidate generates different tokens than fp16 on short-context prompts.

02

Brittleness to small input changes: candidate's output drifts more than fp16's under typo / casing / punctuation perturbations.

Per-axis breakdown

4 axes
axis
metric
score
band
distribution
AGTM
Greedy Trajectory MatchToken-level agreement with the fp16 reference (greedy decode).
17.32
Fail
BKLD@D
KL Divergence at the DecoderDistribution-level divergence from the fp16 reference (corpus KLD).
17.59
Fail
CR-NIAH
Retrieval Needle-In-A-HaystackLong-context retrieval quality vs the reference (NIAH at multiple lengths).
100.00
Excellent
R-NIAH per-cell cand / base retrieval rate · green = match · red = candidate worse · n/a = base also fails at this cell
length \ position0.100.500.90
40961.00 / 1.001.00 / 1.001.00 / 1.00
81921.00 / 1.001.00 / 1.001.00 / 1.00
16384n/an/an/a
DPLAD
Perturbation-Locality Aware DriftRobustness to small prompt changes vs the reference (typo / case / punct / paraphrase).
78.40
Degraded
PLAD per-perturbation
perturbationscorebanddistribution
typo83.66Pass
case68.77Degraded
punct81.84Pass
paraphraseskippedN/Adidn't apply on these prompts

Run details

model · hardware · environment

Model

name
gemma-4-26B-A4B-it-Q8_0.gguf
size
25.02 GB
format
gguf

Hardware

chip
Apple M5 Max
platform
macOS 26.4.1 arm64
ram
128.0 GB
arch
arm64
python
3.14.4

Environment

reference
ctk=f16
ctv=f16
candidate
ctk=q8_0
ctv=turbo4

Reproduce

command
$ python3 -m refract.cli score --model gemma-4-26B-A4B-it-Q8_0.gguf --reference ctk=f16,ctv=f16 --candidate ctk=q8_0,ctv=turbo4 --prompts refract/prompts/v0.1.jsonl --full --rniah-up-to 16384 --json-out report.json --html-out report.html
Raw JSON · machine-readable
{
  "schema": "refract.report.v0.3.1",
  "framework_version": "0.3.2",
  "environment": {},
  "repro_command": "",
  "timestamp": "2026-04-30T16:23:41",
  "score_direction": "higher_is_better",
  "score_range": [
    0,
    100
  ],
  "model": "/Users/tom/models/gemma-4-26B-A4B-it-Q8_0.gguf",
  "reference": "ctk=f16,ctv=f16",
  "candidate": "ctk=q8_0,ctv=turbo4",
  "composite": 29.119378931499842,
  "band": "FAIL",
  "summary": "Material quality loss. Treat as broken.",
  "diagnosis": [
    "Decode distribution drift detected: the candidate generates different tokens than fp16 on short-context prompts.",
    "Brittleness to small input changes: candidate's output drifts more than fp16's under typo / casing / punctuation perturbations."
  ],
  "composite_detail": {
    "gtm_score": 17.31690622861054,
    "kld_score": 17.585948209856365,
    "rniah_score": 100.0,
    "plad_score": 78.40093473438733,
    "floor_score": null,
    "floor_ok": null,
    "floor_min": 99.5,
    "notes": []
  },
  "axes": {
    "gtm": {
      "score": 17.31690622861054,
      "full_match_rate": 0.1,
      "median_first_divergence": 2,
      "mean_prefix_agreement_length": 8.433333333333334,
      "mean_cand_length": 48.7,
      "mean_ref_length": 49.63333333333333,
      "n_prompts": 30,
      "n_tokens_each": 50,
      "per_prompt": [],
      "notes": [],
      "band": "FAIL",
      "description": "Token-level agreement with the fp16 reference."
    },
    "kld": {
      "score": 17.585948209856365,
      "mean_kld": 1.73807,
      "ppl": null,
      "rms_dp_pct": null,
      "same_topp_pct": null,
      "base_path": "",
      "chunks": 32,
      "ctx": 512,
      "is_self_reference": false,
      "corpus": null,
      "band": "FAIL",
      "description": "Distribution-level divergence from the fp16 reference."
    },
    "rniah": {
      "score": 100.0,
      "n_cells": 9,
      "cells": [
        {
          "length": 4096,
          "position": 0.1,
          "n_trials": 1,
          "base_acc": 1.0,
          "cand_acc": 1.0,
          "degradation": 0.0,
          "base_hits": 1,
          "cand_hits": 1
        },
        {
          "length": 4096,
          "position": 0.5,
          "n_trials": 1,
          "base_acc": 1.0,
          "cand_acc": 1.0,
          "degradation": 0.0,
          "base_hits": 1,
          "cand_hits": 1
        },
        {
          "length": 4096,
          "position": 0.9,
          "n_trials": 1,
          "base_acc": 1.0,
          "cand_acc": 1.0,
          "degradation": 0.0,
          "base_hits": 1,
          "cand_hits": 1
        },
        {
          "length": 8192,
          "position": 0.1,
          "n_trials": 1,
          "base_acc": 1.0,
          "cand_acc": 1.0,
          "degradation": 0.0,
          "base_hits": 1,
          "cand_hits": 1
        },
        {
          "length": 8192,
          "position": 0.5,
          "n_trials": 1,
          "base_acc": 1.0,
          "cand_acc": 1.0,
          "degradation": 0.0,
          "base_hits": 1,
          "cand_hits": 1
        },
        {
          "length": 8192,
          "position": 0.9,
          "n_trials": 1,
          "base_acc": 1.0,
          "cand_acc": 1.0,
          "degradation": 0.0,
          "base_hits": 1,
          "cand_hits": 1
        },
        {
          "length": 16384,
          "position": 0.1,
          "n_trials": 1,
          "base_acc": 0.0,
          "cand_acc": 0.0,
          "degradation": 0.0,
          "base_hits": 0,
          "cand_hits": 0
        },
        {
          "length": 16384,
          "position": 0.5,
          "n_trials": 1,
          "base_acc": 0.0,
          "cand_acc": 0.0,
          "degradation": 0.0,
          "base_hits": 0,
          "cand_hits": 0
        },
        {
          "length": 16384,
          "position": 0.9,
          "n_trials": 1,
          "base_acc": 0.0,
          "cand_acc": 0.0,
          "degradation": 0.0,
          "base_hits": 0,
          "cand_hits": 0
        }
      ],
      "skipped_cells": [],
      "needle": "Note: APRICOT-7-BLUE is the rare paint color featured in this article.",
      "password_keyword": "APRICOT-7-BLUE",
      "notes": [],
      "confidence": "ok",
      "base_acc_avg": 0.6666666666666666,
      "band": "EXCELLENT",
      "description": "Long-context retrieval quality vs the reference."
    },
    "plad": {
      "score": 78.40093473438733,
      "per_perturbation_score": {
        "typo": 83.66015410668061,
        "case": 68.766446516516,
        "punct": 81.84221977573544,
        "paraphrase": NaN
      },
      "per_prompt": [],
      "n_prompts": 30,
      "n_perturbations": 4,
      "notes": [
        "36 (prompt, perturbation) pairs were skipped (perturbation could not apply, e.g. no \u22654-char word for typo)."
      ],
      "skipped_perturbations": [
        "paraphrase"
      ],
      "confidence": "partial",
      "band": "DEGRADED",
      "description": "Robustness to small prompt changes vs the reference."
    }
  },
  "extras": {}
}