## ELA Subject Evaluation Rules

## STEP 2.5: ELA STANDARD-SPECIFIC AUDITS (mandatory before Step 3 scoring)

When the targeted standard matches one of these patterns, you MUST paste the named audit verbatim into `internal_reasoning` for the cited metric. Skipping the audit is itself a failure of the metric: if the audit is missing, score that metric `0.0`. The audit must use real values from the content, not boilerplate.

**Trigger:** Determine which audits apply by reading the standard's *description* (not its ID). Audit A applies when the description requires integrating information across formats. Audit B applies when the description names providing or evaluating a concluding statement that follows from an argument. Audit C applies when the description names using affixes/roots as clues to word meaning. When a trigger matches, the audit is MANDATORY — emit it in the cited metric's `internal_reasoning` even if you initially believe the item passes.

### A. Integration-of-formats — curriculum_alignment

Audit format (one row per item, INCLUDING every `image_url`):
```
stimuli_audit:
- passage: load_bearing=YES|NO, evidence="<one sentence>"
- table/data: load_bearing=YES|NO, evidence="..."
- image_url[i]: load_bearing=YES|NO, evidence="..."
cognitive_load_source: "<the single skill the question actually exercises>"
```
- ANY row with `load_bearing=NO` → `curriculum_alignment = 0.0`. "Illustrative / engaging / thematic / Mode C" is NOT a valid reason to mark an integration-standard stimulus load-bearing.
- If `cognitive_load_source` names a single-format skill (e.g., "percentage calculation on the table", "passage reading"), → `curriculum_alignment = 0.0`. Integration must demand combining at least two formats to reach the answer.

### B. Argument-conclusion — factual_accuracy

Audit format:
```
passage_points:
- P1: "<exact claim from passage>"
- P2: "..."
- P3: "..."
option_mapping (correct options only):
- Option <X>: claim_1 -> Pn or NONE, claim_2 -> Pn or NONE, ...
```
- A claim resolves to `NONE` only when mapping it to a passage point requires *inventing a new content or argumentation area* the passage never touched. Loose real-world inference into a domain the passage never argued (e.g., a passage that argued only about cost and supply being summarized with a claim about tourism or safety) IS `NONE`.
- A claim does **NOT** resolve to `NONE` when it is:
  - a **category label or synthesis term** that names what the passage already described in concrete terms (the passage names specific mechanisms and outcomes; the conclusion uses the standard higher-level word for that category);
  - a **routine concluding-statement closer** that the standard expects students to produce ("for these reasons…", "in the future…", "for everyone…", and similar genre-conventional summary moves), where the conclusion's force does not depend on a new factual claim outside the passage;
  - a **mild quantifier broadening** that names the same audience or referent at a slightly higher level (e.g., listing two stakeholder groups → "everyone"; "many places" → "places everywhere") without asserting a new fact.
  In each case, map the claim to the relevant `Pn` (synthesis-of) and proceed.
- ANY correct option with a `NONE` claim → `factual_accuracy = 0.0`. "Pedagogical synthesis" / "appropriate inference for hard items" do NOT rescue genuine domain shifts. They DO rescue category-label synthesis covered by the bullet above — that is the standard's intended skill, not a violation.

### ELA G3–8 Stimulus Register (clarity_precision)

For ELA items targeting grades 3–8, the passage, transcript, or writing
samples that the student must read should sit within (or only modestly
above) the typical reading band for the target grade. If the stimulus
consistently uses college- or adult-professional register words that are
*not* curriculum terms the standard is teaching (e.g., business / civic /
formal-administrative vocabulary across multiple sentences) AND a
same-skill version of the stimulus could be written in grade-band
vocabulary without changing what the item assesses, flag →
`clarity_precision = 0.0` (vocabulary inappropriate for target grade).
This applies to *student-facing stimulus text*. Do NOT flag because a
single domain term appears once; the failure is a sustained register
mismatch across the passage. Do NOT flag a science or social-studies
term that the targeted standard itself names or that is part of the
skill being taught.

(See also the stacked-instruction discriminator in the base prompt's
Checklist C, which refines the shared verbose-stem rule for ELA G3–8.)

### C. Affix/root vocabulary

Audit format:
```
target_words:
- <word>: standard_pos=<noun|verb|adj|...>, used_pos=<...>, grade_band_fit=<at|just-above|clearly-above>
```
- ANY word with `used_pos` != `standard_pos` → `stimulus_quality = 0.0`. "Functional shift / acceptable in ELA" is NOT a valid override.
- TWO or more words with `grade_band_fit=clearly-above` → `difficulty_alignment = 0.0`. The "Hard" label does not excuse off-band vocabulary.

---

## FORMATTING-CONVENTION QUESTIONS

Some questions test whether students can correctly apply typographic/punctuation conventions — italics, underlining, or quotation marks for titles of works. For these questions, **the formatting markup in the answer options IS the substance of the correct answer**, not decoration.

**How to identify a formatting-convention question:**
- The standard explicitly targets conventions like "use italics, underlining, or quotation marks to indicate titles of works" (e.g., CCSS.ELA-LITERACY.L.x.2.D)
- The question stem asks students to identify or apply correct formatting (e.g., "Which sentence uses the title correctly?", "Which revision follows the editor's note?", "Select the sentences that show the title of a work correctly.")

**MANDATORY three-way alignment check for formatting-convention questions:**

The question stem, the answer options (raw markup), and the `answer_explanation` must all independently agree. Do NOT use `answer_explanation` as your sole source of truth — the explanation may be absent, wrong, or misleading.

Verify all three of the following:

1. **Option correctness** — Based on the standard and stem alone (ignoring the explanation), determine what correct formatting looks like for each work type (book → `<i>` or `<u>`; poem/article/song/chapter → `"..."`). Read each option's raw text/markup and verify:
   - The designated correct answer option actually contains the required markup applied to the right word(s).
   - If the correct answer option's markup does NOT match what the standard requires → `factual_accuracy = 0.0`.

2. **Distractor validity** — Verify that none of the incorrect options accidentally also contains correct formatting, which would make it an additional correct answer. If a distractor is accidentally correct → `factual_accuracy = 0.0`.

3. **Explanation alignment** — Verify that `answer_explanation` (if present) agrees with your independent assessment from steps 1 and 2. A mismatch between the explanation and the actual option text → `factual_accuracy = 0.0`.

**If `answer_explanation` is absent**: Do NOT assume correctness. Perform steps 1 and 2 from the raw option text alone. A missing explanation is not itself a failure, but it removes the safety net — the options must stand on their own.

**This does NOT change the general rendering assumption** — for non-formatting-convention questions, references to formatting in the stem are still assumed to render correctly and are never a clarity issue.

**FORMATTING-CONVENTION MISMATCH** (applies when the question tests formatting conventions):
- The designated correct answer option does not actually contain the required formatting markup in its raw text (e.g., answer key says C but option C's text has no `<i>`, `<u>`, or `"..."` on the title)
- A distractor option accidentally contains the correct formatting, making it also a valid answer
- The `answer_explanation` (when present) describes formatting that does not exist in the actual option text, or agrees with an option that is not correctly formatted
- **CRITICAL**: Always read each option's raw markup directly to verify. Do NOT infer option correctness from the explanation alone. If `answer_explanation` is absent, verify options 1 and 2 from raw text only.

