## ELA Subject Evaluation Rules

## STEP 2.5: ELA STANDARD-SPECIFIC AUDITS (mandatory before Step 3 scoring)

When the targeted standard matches one of these patterns, you MUST paste the named audit verbatim into `internal_reasoning` for the cited metric. Skipping the audit is itself a failure of the metric: if the audit is missing, score that metric `0.0`. The audit must use real values from the content, not boilerplate.

**Trigger:** Determine which audits apply by reading the standard's *description* (not its ID). Audit A applies when the description requires integrating information across formats. Audit B applies when the description names providing or evaluating a concluding statement that follows from an argument. Audit C applies when the description names using affixes/roots as clues to word meaning. When a trigger matches, the audit is MANDATORY — emit it in the cited metric's `internal_reasoning` even if you initially believe the item passes.

### A. Integration-of-formats — curriculum_alignment

Audit format (one row per item, INCLUDING every `image_url`):
```
stimuli_audit:
- passage: load_bearing=YES|NO, evidence="<one sentence>"
- table/data: load_bearing=YES|NO, evidence="..."
- image_url[i]: load_bearing=YES|NO, evidence="..."
cognitive_load_source: "<the single skill the question actually exercises>"
```
- ANY row with `load_bearing=NO` → `curriculum_alignment = 0.0`. "Illustrative / engaging / thematic / Mode C" is NOT a valid reason to mark an integration-standard stimulus load-bearing.
- If `cognitive_load_source` names a single-format skill (e.g., "percentage calculation on the table", "passage reading"), → `curriculum_alignment = 0.0`. Integration must demand combining at least two formats to reach the answer.

### B. Argument-conclusion — factual_accuracy

Audit format:
```
passage_points:
- P1: "<exact claim from passage>"
- P2: "..."
- P3: "..."
option_mapping (correct options only):
- Option <X>: claim_1 -> Pn or NONE, claim_2 -> Pn or NONE, ...
```
- A claim resolves to `NONE` if mapping it to a passage point requires inventing a new domain (economy, beauty, tourism, safety, etc.) the passage never argued. Loose real-world inference (e.g., "cheaper veg implies economy") IS `NONE`.
- ANY correct option with a `NONE` claim → `factual_accuracy = 0.0`. "Pedagogical synthesis" / "appropriate inference for hard items" do NOT rescue it.

### C. Affix/root vocabulary

Audit format:
```
target_words:
- <word>: standard_pos=<noun|verb|adj|...>, used_pos=<...>, grade_band_fit=<at|just-above|clearly-above>
```
- ANY word with `used_pos` != `standard_pos` → `stimulus_quality = 0.0`. "Functional shift / acceptable in ELA" is NOT a valid override.
- TWO or more words with `grade_band_fit=clearly-above` → `difficulty_alignment = 0.0`. The "Hard" label does not excuse off-band vocabulary.

---

## FORMATTING-CONVENTION QUESTIONS

Some questions test whether students can correctly apply typographic/punctuation conventions — italics, underlining, or quotation marks for titles of works. For these questions, **the formatting markup in the answer options IS the substance of the correct answer**, not decoration.

**How to identify a formatting-convention question:**
- The standard explicitly targets conventions like "use italics, underlining, or quotation marks to indicate titles of works" (e.g., CCSS.ELA-LITERACY.L.x.2.D)
- The question stem asks students to identify or apply correct formatting (e.g., "Which sentence uses the title correctly?", "Which revision follows the editor's note?", "Select the sentences that show the title of a work correctly.")

**MANDATORY three-way alignment check for formatting-convention questions:**

The question stem, the answer options (raw markup), and the `answer_explanation` must all independently agree. Do NOT use `answer_explanation` as your sole source of truth — the explanation may be absent, wrong, or misleading.

Verify all three of the following:

1. **Option correctness** — Based on the standard and stem alone (ignoring the explanation), determine what correct formatting looks like for each work type (book → `<i>` or `<u>`; poem/article/song/chapter → `"..."`). Read each option's raw text/markup and verify:
   - The designated correct answer option actually contains the required markup applied to the right word(s).
   - If the correct answer option's markup does NOT match what the standard requires → `factual_accuracy = 0.0`.

2. **Distractor validity** — Verify that none of the incorrect options accidentally also contains correct formatting, which would make it an additional correct answer. If a distractor is accidentally correct → `factual_accuracy = 0.0`.

3. **Explanation alignment** — Verify that `answer_explanation` (if present) agrees with your independent assessment from steps 1 and 2. A mismatch between the explanation and the actual option text → `factual_accuracy = 0.0`.

**If `answer_explanation` is absent**: Do NOT assume correctness. Perform steps 1 and 2 from the raw option text alone. A missing explanation is not itself a failure, but it removes the safety net — the options must stand on their own.

**This does NOT change the general rendering assumption** — for non-formatting-convention questions, references to formatting in the stem are still assumed to render correctly and are never a clarity issue.

**FORMATTING-CONVENTION MISMATCH** (applies when the question tests formatting conventions):
- The designated correct answer option does not actually contain the required formatting markup in its raw text (e.g., answer key says C but option C's text has no `<i>`, `<u>`, or `"..."` on the title)
- A distractor option accidentally contains the correct formatting, making it also a valid answer
- The `answer_explanation` (when present) describes formatting that does not exist in the actual option text, or agrees with an option that is not correctly formatted
- **CRITICAL**: Always read each option's raw markup directly to verify. Do NOT infer option correctness from the explanation alone. If `answer_explanation` is absent, verify options 1 and 2 from raw text only.

