# Pandoc-native conformance allowlist.
#
# One case ID per line. Matches the directory prefix under
# tests/fixtures/pandoc-conformance/corpus/<NNNN>-<section>-<slug>/.
#
# Append-only in spirit: only remove an entry if you have a concrete reason and
# a follow-up plan. Group entries under their section header. Verify each ID
# appears in the latest tests/pandoc/report.txt before adding.

# inline
1
2
3
4
5
6
7
8
9
10
11
12
25
190
191

# block
13
14
15
16
17
18
19
20
21
22
23
24
188
189
192
454
455

# imported (machine-imported via scripts/import-pandoc-conformance-from-parser-fixtures.sh)
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187

# html-block
193
194
195
196
197
198
199
200
201
202

# html-inline
203
204
205
206
207
208
209
210

# html-block (sectioning + verbatim — no markdown inside verbatim, simple cases)
211
212
213
214
215
216
217
218
219
220

# html-block (comments + processing instructions)
221
222
223
224
225
230
231

# html-inline (comments + processing instructions)
226
227
228
229
232

# html-block (declarations + CDATA — pandoc-markdown does not recognize these as raw HTML; types 4/5 gated CommonMark-only)
233
234
235
236
237
238

# html-inline (declarations + CDATA — pandoc-markdown does not recognize these as inline raw HTML)
239
240

# html-block (markdown_in_html_blocks — non-sectioning block tags split into per-tag RawBlocks with markdown-parsed content between them)
241
242
243
244
245
246
247
248
249
250

# html-block (nested div — depth-aware close scan, mirrors pandoc's htmlInBalanced)
251
252
253
254

# html-block (div recursive parse — inner heading auto-ids, ref-link defs, footnote defs resolve in inner RefsCtx instead of leaking outer)
255
256
257
258
259
260
261
262
263
264

# html-block (div recursive parse — outer ref-link defs, footnote defs, and heading-slug history inherited into the inner RefsCtx so cross-boundary uses resolve)
265
266
267

# html-block (multi-line div open tag — `<div\n  attrs\n>` with the closing `>` on a separate line; structural HTML_ATTRS exposure across lines so the salsa anchor walk picks up the id)
268

# html-block (div Plain/Para promotion — last block keeps `Para` only when `</div>` sits at column 0 on its own line; demotes to `Plain` when close is butted to content or sits on an indented line; mirrors pandoc's `markdown_in_html_blocks` recursive parse)
269
270
271
272

# html-block / html-inline (Pandoc-specific blockHtmlTags — pandoc-markdown's narrower block-tag set vs CommonMark §4.6 type-6: `dialog`, `legend`, `option`, etc. fall through to inline raw HTML, while `canvas`, `meta`, `hgroup`, `output` open HTML blocks)
273
274
275
276
277
278
279

# html-block (HTML5 sectioning + grouping — header, footer, main, details/summary, figure/figcaption, nav with markdown_in_html_blocks)
280
281
282
283
284
285
286

# html-block (eitherBlockOrInline tags — pandoc's `eitherBlockOrInline` set: at fresh-block positions `<iframe>`, `<button>`, `<video>`, `<del>`, etc. lift to RawBlock+Plain+RawBlock; inside an existing inline run they stay inline as RawInline. The projector tracks `inline_pending` to keep case 0248 and other nested cases inline.)
287
288
289
290
291
292
293
294
295
296
297

# html-block (eitherBlockOrInline void elements — pandoc's void `eitherBlockOrInline` subset: `<embed>`, `<area>`, `<source>`, `<track>` emit a single RawBlock at fresh-block positions; inside a running paragraph they stay inline as RawInline. The parser closes the block on the open-tag line (no closing tag to match), and the projector splits same-line shapes like `<embed src="x"> trailing` via the same `inline_pending` rule. Case 305 tests `<source>` lifted as a separate RawBlock when nested inside a `<video>` matched-pair lift.)
298
299
300
301
302
303
304
305

# html-block (incomplete open tag — pandoc treats `<embed`, `<div`, `<table`, `<iframe` etc. with no `>` anywhere as paragraph text rather than RawBlock. Recognizing them as HTML blocks made the projector reparse the same bytes and infinite-recurse; the parser now rejects open tags that never close in `pandoc_html_open_tag_closes`.)
306
307
308
309

# html-block (multi-line void open tag — `<embed\n  src="x">`, `<area\n  href="x">`, `<source\n  src="x.mp3">`, `<track\n  src="t.vtt">` and three-line variants; the parser captures all open-tag lines in a single HTML_BLOCK_TAG so the projector emits one RawBlock matching pandoc-native instead of splitting the open tag across paragraphs. Generalized `find_multiline_open_end` over tag name; void path emits simple TEXT+NEWLINE per line — no HTML_ATTRS structural node is needed since the projector doesn't read attributes for void tags.)
310
311
312
313
314

# html-block (inline-block matched-pair lift abandons when interior starts with a void block tag — `<video>\n<source>...\nfallback\n</video>` projects as separate RawBlocks for `<video>` and `<source>` plus a Para[fallback, SoftBreak, RawInline</video>] (per pandoc-native), instead of the prior lift that demoted to Plain[fallback] and emitted `</video>` as RawBlock. Closing forms of inline-block / void tags (`</video>`, `</button>`, `</embed>`, ...) now also start a single-line RawBlock at fresh-block positions, fixing standalone-close-tag recursion and matching pandoc's `htmlBlock` behavior. Cases 316–318 cover the standalone-close + open-without-close shapes.)
315
316
317
318

# html-block (strict-block + verbatim closing forms — standalone `</p>`, `</nav>`, `</section>`, `</pre>` etc. emit as single-line RawBlock under Pandoc, mirroring pandoc-native's `htmlBlock isBlockTag` accept-both-directions rule. Unlike inline-block / void closes, these CAN interrupt a running paragraph (no `cannot_interrupt` in the dispatcher); the projector's existing `is_pandoc_block_tag_name` arm in `split_html_block_by_tags` already splits trailing same-line text into a separate Para, so `</p> bar` projects as [RawBlock</p>, Para[bar]]. Para→Plain demotion when a strict-close interrupts a running paragraph (`foo\n</p>` → `[Plain[foo], RawBlock</p>]` per pandoc) is a separate gap shared with strict-block opens; not addressed here.)
319
320
321
322
323
324

# html-block (paragraph PARAGRAPH→PLAIN demotion when interrupted by HTML strict-block / verbatim — `foo\n<p>` and `foo\n</p>` and `foo\n<div>...</div>` and `foo\n<pre>...</pre>` all project as `[Plain[foo], …]` per pandoc, not `[Para[foo], …]`. Parser-side decision: at `BlockDetectionResult::YesCanInterrupt` for an HTML BlockTag under Pandoc dialect, retag the closing PARAGRAPH wrapper as PLAIN. Inline-block / void / comment paths take `cannot_interrupt` and stay inside Para. Replicates the 2026-05-10-reverted projector demotion as a parser decision.)
325
326
327
328
329

# html-block (`<style>` and processing instructions cannot interrupt a paragraph under Pandoc — `foo\n<style>x</style>` and `foo\n<?pi?>` stay as a single Para with the tag(s) as RawInline, matching pandoc-native. The real reason is pandoc's `isInlineTag` predicate (`Readers/HTML.hs`): `<style>` open and close are SPECIAL-CASED to always be inline (commit fixing issue #10643), and PIs match `T.take 1 name == "?"`. `<style>` and `<textarea>` are BOTH in pandoc's `blockHtmlTags`, but only `<style>` gets the `isInlineTag` override — that's why `<pre>`, `<script>` open, `<textarea>` interrupt while `<style>` does not. Parser-side: extend `cannot_interrupt` in `HtmlBlockParser::detect_prepared` to include `HtmlBlockType::ProcessingInstruction` and `BlockTag { tag_name == "style" }` under Pandoc dialect.)
330
331
332

# html-block (`</script>` close cannot interrupt a paragraph under Pandoc — `foo\n</script>\n` stays as `Para[foo, SoftBreak, RawInline</script>]` matching pandoc-native. Pandoc's `isInlineTag` SPECIAL-CASES `</script>` (close form) to always be inline, regardless of `<script>` open being a block start. Parser-side: added `is_closing: bool` field to `HtmlBlockType::BlockTag`; `cannot_interrupt` matches `is_closing && tag_name == "script"`.)
333

# citation (bracketed citation prefix may contain ordinary parens and `\@`-escaped at-signs without aborting citation detection; matches pandoc's NormalCitation with the prefix captured as ordinary inlines, e.g. `[see (Smith 1999) and @doe99]` and `[see \@ref(svm) and @bischl_applied_2024]`)
334

# html-block (`<script type="math/tex…">` open cannot interrupt a paragraph under Pandoc — `foo\n<script type="math/tex">x</script>\nbar\n` stays as a single `Para[foo, SoftBreak, RawInline<script…>, Str x, RawInline</script>, SoftBreak, bar]` matching pandoc-native. Pandoc's `isInlineTag` SPECIAL-CASES `<script>` open with a `type` attribute whose value has the `math/tex` prefix (case-insensitive, e.g. `math/tex`, `math/tex; mode=display`) to always be inline; every other `<script>` open is a `RawBlock` start (`<pre>`, `<textarea>`, `<script>` without math/tex DO interrupt). Parser-side: `cannot_interrupt` in `HtmlBlockParser::detect_prepared` consults `is_math_tex_script_open(ctx.content)`, which parses the open tag's attributes via `parse_html_tag_attributes` and matches `type` values that start with `math/tex` (case-insensitive prefix). At fresh-block positions (`<script type="math/tex">…</script>` alone, or after a blank line) the tag still lifts to `RawBlock` per pandoc-native; only the mid-paragraph case takes the inline path.)
335

# html-block (HTML blocks inside a blockquote — `> <div>...`, `> <p>...`, `> <pre>...`, `> <video>...` etc. project with the BLOCK_QUOTE wrapping a single Div/RawBlock(s) matching pandoc-native. The parser keeps BLOCK_QUOTE_MARKER + WHITESPACE prefix tokens inside the HTML_BLOCK_CONTENT and the close HTML_BLOCK_TAG (lossless CST). The projector strips those marker pairs via `collect_html_block_text_skip_bq_markers` before feeding the bytes to `try_div_html_block` / `split_html_block_by_tags` / `flush_html_block_text` / `extract_div_inner_and_butted` / `parse_pandoc_blocks`; without the strip the inner reparse re-recognizes the `> ` prefixes as a nested blockquote (or a verbatim RawBlock includes literal `>` characters). Applied uniformly in `html_div_block` and `emit_html_block`. Handles arbitrary nesting depth (`> > <div>`); div attributes on the open tag remain accessible via the existing `cst_div_open_tag_attr` walk because the open-tag's leading marker is consumed by the outer BLOCK_QUOTE wrapper, not by HTML_BLOCK_DIV / HTML_BLOCK.)
336
337
338
339
340
341

# html-block (empty `<div></div>` / `<div>\n</div>` / `<div>\n\n</div>` projects to `Div ("",[],[]) []` matching pandoc-native. Parser already lifts the shape into clean `HTML_BLOCK_TAG` + (optional `BLANK_LINE`s) + `HTML_BLOCK_TAG` children with no `HTML_BLOCK_CONTENT`; the projector's `div_has_structural_inner` predicate was relaxed from "requires a non-(TAG/BLANK_LINE/CONTENT) body child" to "no `HTML_BLOCK_CONTENT` children", which trusts the parser-side lift even when the body is empty or blank-only. Bq-wrapped divs still carry `HTML_BLOCK_CONTENT` for the bq-prefixed body lines and continue to fall through to the byte-reparse path.)
342
343
344

# html-block (Fix #4 strict-block body structural lift — `<section>foo\n\nbar\n</section>` and friends project as `RawBlock + Para + Plain + RawBlock` with the trailing paragraph demoted to Plain when no blank line precedes the close, matching pandoc's `markdown_in_html_blocks` adjacency rule. Parser now lifts the inner body of clean multi-line non-div Pandoc strict-block tags (`PANDOC_BLOCK_TAGS` minus verbatim / inline-block / void / `<div>`) into structural CST children (PARAGRAPH/PLAIN/HEADING/LIST/...) when (a) outside any blockquote, (b) open tag is on its own line with no trailing content after `>`, (c) close tag is on its own line with at most whitespace indent, (d) no multi-line open. Same-line / open-trailing / butted-close / BQ-wrapped shapes stay opaque and fall through to the projector byte walker. PARAGRAPH→PLAIN retag at parse time follows the `LastParaDemote::OnlyIfLast` rule — demote only when no trailing `BLANK_LINE` separates the last paragraph from the close tag. Projector adds `html_block_has_structural_lift` + `emit_html_block_structural` to walk the lifted children directly (open `HTML_BLOCK_TAG` → RawBlock, body children → collect_block, close `HTML_BLOCK_TAG` → RawBlock), avoiding the byte-reparse path that would re-disambiguate heading auto-ids against a fresh inner `RefsCtx`. `html_block_open_tag_is_clean` predicate loosened from "TEXT exactly equal to `>`" to "TEXT ending in `>`" so it accepts both `<div>`'s split-`>` emission and non-div's whole-line TEXT.)
345
346
347

# html-block (non-div strict-block butted-close / open-trailing / same-line shapes — `<form>\nfoo</form>`, `<form>foo\nbar\n</form>`, `<form>foo</form>` now lift to structural CST children matching pandoc's `RawBlock + Plain + RawBlock` projection. Parser extends Fix #4's clean-multi-line strict-block lift to the remaining shape variants: open-trailing captures bytes after `>` into `pre_content` via `emit_open_tag_tokens` (generalized from div's same helper); butted-close removes the leading-whitespace gate on `try_split_close_line` since `LastParaDemote::OnlyIfLast` already handles non-blank-terminated demotion; same-line reuses the existing div same-line lift path (`pre_content` splits at `</tag>`) gated on a new `same_line_strict_lift_safe` check that reuses the generalized `probe_same_line_lift(line, tag_name)`. CST gain: open tags split into `TEXT("<form") + (WHITESPACE + HTML_ATTRS{TEXT(attrs)})? + TEXT(">")` so `AttributeNode::cast(HTML_ATTRS)` finds non-div strict-block ids too. Multi-line open tags and bq-wrapped non-div shapes still fall through to opaque `HTML_BLOCK_CONTENT` + projector byte walker.)
348
349
350

# html-block (non-div strict-block multi-line open tag — `<form\n  id="x"\n  class="y">…</form>`, `<section\n  id="intro">…</section>` etc. now lift to structural CST children with `HTML_ATTRS` exposure on each attribute line, matching pandoc-native's `RawBlock + body + RawBlock` projection. Parser extends `find_multiline_open_end` recognition to Pandoc-lift-eligible strict-block tags (`is_pandoc_lift_eligible_strict_block_tag` predicate), reuses the generalized `emit_multiline_open_tag_with_attrs(tag_name)` (renamed from `emit_multiline_div_open_tag`) for both div and non-div strict-block multi-line opens. The `strict_block_lift` gate accepts `multiline_open_end.is_some()` directly because `find_multiline_open_end` already verified the open tag closes with a quote-aware `>` somewhere downstream. Salsa's existing `AttributeNode` descendants walk picks up multi-line `<section id="intro">` ids as anchor declarations. Projector adds `open_tag_raw_block_text` helper in `pandoc_ast.rs` to canonicalize the open-tag `RawBlock` text to pandoc's single-line `<tag attr1 attr2 ...>` form when structural `HTML_ATTRS` regions are present (multi-line collapses to one line; inter-attribute whitespace normalizes to a single space); without canonicalization the projector emitted literal `<form\n  id="x"\n  class="y">` bytes which diverged from pandoc-native's `<form id="x" class="y">` since `normalize_native` preserves whitespace inside string literals.)
351
352

# html-block (multi-line HTML open tag inside a blockquote — `> <div\n>   id="x">\n> body\n> </div>` and `> <section\n>   id="x">\n> body\n> </section>` now lift to structural CST children with `HTML_ATTRS` exposure on each attribute line. Parser extends `find_multiline_open_end` to accept `bq_depth > 0` by stripping `> ` markers per line via `strip_n_blockquote_markers` before the quote-aware byte scan. `emit_multiline_open_tag_with_attrs` and `emit_multiline_open_tag_simple` take a `bq_depth` parameter and re-inject `BLOCK_QUOTE_MARKER + WHITESPACE` prefix tokens between lines via `emit_bq_prefix_tokens` (skipping line 0, whose prefix is owned by the outer BLOCK_QUOTE). The bq lift gates (`bq_lift_tag`, `bq_clean_lift`) widened to accept multi-line opens; `bq_clean_lift` checks the close line of the open (last open line) for trailing content, not just `first_inner`. Dispatcher's `HTML_BLOCK_DIV` retag gate adds `pandoc_html_open_tag_closes_cleanly` (a new helper mirroring `pandoc_html_open_tag_closes` plus a "tail must be whitespace" check) alongside the existing `probe_open_tag_line_has_close_gt` so single-line opens with trailing keep retagging (parser captures trailing into `pre_content`) while multi-line opens with trailing on the close line stay opaque (`div_has_structural_inner` would otherwise fail and `html_div_block` would `debug_assert!`). Non-div tag emission gates on `bq_strict_attr_emit_tag_name`, already-defined for single-line non-div bq cases, naturally extends. Multi-line + trailing-on-close-line shapes remain opaque (parser fixture `html_block_div_blockquote_multiline_open_trailing_pandoc` pins).)
353
354

# html-block (multi-line HTML open tag with trailing content on the close-`>` line — `<div\n  id="x">trailing\nbody\n</div>` and its bq-wrapped variants now lift to structural CST children matching pandoc-native's `Div ( "x" , [] , [] ) [ Para [...] ]` projection. `emit_multiline_open_tag_with_attrs` gains `lift_trailing: bool` + `pre_content: &mut String` parameters: when set and the close-`>` line has trailing bytes, those bytes plus the trailing newline are pushed into `pre_content` instead of being emitted as `TEXT` inside `HTML_BLOCK_TAG`, so the open tag ends cleanly with `TEXT(">")` and `html_block_open_tag_is_clean` accepts. The dispatcher's `HTML_BLOCK_DIV` retag gate switches from `pandoc_html_open_tag_closes_cleanly` back to `pandoc_html_open_tag_closes` (incomplete opens still excluded since the helper requires a `>`); the `_cleanly` variant is removed. `bq_messy_lift_tag` drops the `multiline_open_end.is_none()` clause so multi-line opens with trailing or butted-close inside a blockquote route through the messy-lift path. Inside the bq path, `lift_trailing` is gated on `bq_messy_lift_tag == Some(name)` so clean multi-line opens (where trailing is empty anyway) skip the conditional capture and continue through `bq_clean_lift`. Non-div bq variant `0357-html-block-section-blockquote-multiline-open-trailing` projects as `RawBlock + Plain + RawBlock` via `OnlyIfLast` demotion; div variants demote per the `Never` (close-clean) / `SkipTrailingBlanks` (close-butted) rule.)
355
356
357
# html-block (standalone `</div>` close tag with no matched open — projects as a single RawBlock per pandoc-native, not as an empty Div. The `HTML_BLOCK_DIV` retag in `block_dispatcher.rs::HtmlBlockParser::parse_prepared` previously fired for any `tag_name == "div"` regardless of `is_closing`, because `pandoc_html_open_tag_closes` returned true for `</div>` (the line has a `>`). The retag is now gated on `is_closing: false` so close-form `</div>` keeps the opaque `HTML_BLOCK` wrapper. Symptom before fix: `panic at pandoc_ast.rs:1111` ("HTML_BLOCK_DIV without structural inner shape") on any input where `</div>` appeared without a matched block-level open — e.g. inside a list item whose paragraph already consumed the open `<div>` as `INLINE_HTML`, or a bare `</div>\n` at document start.)
358

# html-block (leading 1-3 space indent on a non-div HTML strict-block open tag — `  <article>\nbody\n</article>` and friends now project as `RawBlock "<article>" + Plain + RawBlock "</article>"` matching pandoc-native, with the leading indent stripped from the `RawBlock` text. The parser already captured the leading indent as a `WHITESPACE` token inside the open `HTML_BLOCK_TAG` (before the tag-name `TEXT`), and the body lift was already firing (`Plain` was correct). Projector's `open_tag_raw_block_text` now skips a leading `WHITESPACE` token before any tag-name text accumulates, matching pandoc's HTML block scanner which accepts ≤ 3 leading spaces on the open line but does not round-trip them into the `RawBlock` text. The same logic naturally applies to close tags when the parser emits leading `WHITESPACE` inside a close `HTML_BLOCK_TAG`.)
359

# html-block (indented close tag — `<article>\nbody\n   </article>` now projects as `RawBlock "<article>" + Plain + RawBlock "</article>"` matching pandoc-native. Before the fix, `try_split_close_line` returned the leading `   ` as body content, which the recursive body lift appended to `body\n` and parsed as `body\n   ` → `PARAGRAPH + BLANK_LINE`. The trailing `BLANK_LINE` blocked `LastParaDemote::OnlyIfLast` from firing so the body stayed `Para` instead of `Plain`. The strict-block lift path now treats whitespace-only leading as close-tag indentation: emits a `WHITESPACE` token inside the close `HTML_BLOCK_TAG` (which the projector strips via its existing WS-prefix skip) and passes an empty body leading to the recursive parse. The `<div>` shape was already correct here because its `SkipTrailingBlanks` demote policy walks past the trailing `BLANK_LINE`; the fix preserves that policy for div (keyed on `!leading.is_empty()` of the source) while routing the bytes correctly.)
360

# html-block (same-line HTML block as first content of a list item — `- <div id="x">body</div>`, `- <!-- comment -->`, `- <pre>foo</pre>`, `- <p>foo</p>` now project as block-level `Div [Plain [body]]` / `RawBlock` etc. inside the list item instead of `Plain[RawInline <tag>, body, RawInline </tag>]`. Implemented as an emit-time structural lift in `ListItemBuffer::emit_as_block`: when the buffered text begins with a recognized HTML block opener under Pandoc dialect (`try_parse_html_block_start`), the buffer text is reparsed via `parse_with_refdefs` and the resulting single top-level HTML_BLOCK / HTML_BLOCK_DIV is grafted as a direct LIST_ITEM child — bypassing the default PLAIN/PARAGRAPH wrap. Gate is strict: exactly one top-level child, must be HTML_BLOCK or HTML_BLOCK_DIV, must consume every byte of the buffer text, and HTML_BLOCK_DIV must have >=2 HTML_BLOCK_TAG children (matched open+close). The strict gate rejects multi-line shapes where the close tag lives in a sibling HTML_BLOCK because the dispatcher recognizes Pandoc strict-block close forms as block starts and breaks the buffer — that sub-target stays open. Reparse threads the outer config's `refdef_labels` so reference links inside the lifted block resolve against the document-level refdef set.)
361
362
363
364

# html-block / html-inline (multi-line HTML block as list-item content for non-div strict-block tags, inline-block matched-pair tags, and inline spans — `- <section>\n  hello\n  </section>`, `- <article>...`, `- <video src="x">...`, `- <iframe>...`, `- <span id="x">body</span>` now project structurally instead of as `Plain[RawInline open, body, RawInline close]`. Implementation: a new field `BlockContext::list_item_unclosed_html_block_tag` carries the tag name of an unclosed Pandoc matched-pair open in the enclosing `Container::ListItem`'s buffer. `HtmlBlockParser::detect_prepared` returns `None` for a close-form whose tag matches that field, so `</section>` / `</video>` / etc. fall through to buffer continuation instead of dispatching as a separate block. The buffer at LIST_ITEM close-time then contains the full matched-pair text; `ListItemBuffer::try_emit_html_block_lift` reparses and grafts the lifted HTML_BLOCK / HTML_BLOCK_DIV as a direct LIST_ITEM child. `count_tag_balance` and `is_pandoc_lift_eligible_block_tag` promoted to `pub(crate)`, with new helper `is_pandoc_matched_pair_tag` covering strict-block, inline-block, and verbatim tags (excluding void). Gates only fire under Pandoc dialect. Known divergences kept opaque: multi-line `<div>` body produces `Div [Plain [body]]` instead of pandoc's `Div [Para [body]]` (indent-normalization gap — pandoc strips list-item content_col leading from continuation lines before reparse), and `<pre>` content includes the list-item leading indent (`<pre>\n  foo\n  </pre>` vs pandoc's `<pre>\nfoo\n</pre>`). Both are tracked in the recap.)
365
366
367
368
369

# html-block (list-item indent normalization for the HTML-block first-line structural lift — multi-line `<div>` in a list now projects as `Div [Para [body]]` matching pandoc-native instead of `Div [Plain [body]]`, and verbatim tags (`<pre>`, `<style>`, `<script>`, `<textarea>`) inside a list emit `RawBlock` text without the list-item leading indent (e.g. `<pre>\nbody\n</pre>` instead of `<pre>\n  body\n  </pre>`). `ListItemBuffer::emit_as_block` now threads the enclosing `Container::ListItem::content_col` through to `try_emit_html_block_lift`, which calls `strip_list_item_indent` to strip up to `content_col` leading-space bytes from each continuation line of the buffer text before the inner reparse. The stripped bytes are re-injected as `WHITESPACE` tokens at line starts during graft via a new `LinePrefixState` mirroring the existing `BqPrefixState` pattern, so the CST stays byte-equal to source. The projector's `walk_skip_bq_markers` (used by `collect_html_block_text_skip_bq_markers` for opaque HTML_BLOCK / verbatim-tag projection) skips a leading `WHITESPACE` token at the start of each source line when not preceded by a `BLOCK_QUOTE_MARKER` — unambiguous because the parser never emits a leading line-start `WHITESPACE` inside `HTML_BLOCK_CONTENT` outside this lift path. Inline coalescence's trim-edges rule strips the leading injected `Space` from PARAGRAPH/PLAIN content automatically, and structural `HTML_BLOCK_TAG`'s `open_tag_raw_block_text` already strips leading WS. Tested for div, pre, style, script, textarea.)
370
371
372
373
374

# html-inline (corpus pins for `<span>` shapes not covered by 0203-0210: emphasis-wrapping-span `*foo <span>bar</span> baz*` locks the `pandoc-ir-migrate` IR-trap — emphasis must not pair into span content; `<span>` inside link `[<span id="x">linked</span>](url)` confirms balanced span detection inside link text; nested `<span>` inside `<span>` confirms recursive inline-HTML scanning. All three already match pandoc-native; pure corpus expansion, no parser changes.)
375
376
377

# html-block (corpus pins for list-item HTML-block shapes building on 0370-0374: multi-line `<div>` open `- <div\n  id="x">\n  body\n  </div>` projects structurally as `Div [Para [body]]` — the existing list-item close-form dispatcher gate + content_col indent normalization handle multi-line opens transparently because `try_parse_html_block_start` recognizes the opener on the first line and `find_multiline_open_end` walks subsequent lines for the closing `>`; tab-indented list-item shapes `-\t<div>\n\t...\n\t</div>` and `-\t<pre>\n\t...\n\t</pre>` confirm `strip_list_item_indent` correctly handles tab characters in the leading indent — tabs advance content_col by 4 and the prefix-strip leaves the tab in place when removing it would overshoot content_col.)
378
379
380

# html-block (3-line `<div>` open inside a list item — `- <div\n    id="x"\n    class="y">\n  body\n  </div>` extends 0378's 2-line variant to verify the multi-line open recognition transparently handles ≥3 attribute lines through `find_multiline_open_end`'s quote-aware scan; the list-item content_col indent normalization re-injects the same per-line WHITESPACE prefix regardless of open-tag span. No parser changes; pure corpus pin.)
381

# html-inline (corpus pins for `<span>` inside non-paragraph inline contexts: span in footnote definition `[^foo]: <span class="x">note</span>` lifts to `Note [Para [Span ...]]`; span in table cell `| <span id="x">b</span> |` lifts to `Cell [Plain [Span ...]]` (and the salsa anchor index sees the `id="x"`); span text inside an inline code span `` `<span class="x">code</span>` `` is preserved verbatim as `Code` content (regression — Code parsing must not trigger span lift); span text inside inline math `$<span class="x">x</span>$` is preserved verbatim as `Math` content (regression — math parsing must not trigger span lift). All four already match pandoc-native; pure corpus expansion, no parser changes.)
382
383
384
385

# html-block (Comment / PI trailing-text split: `<!-- hi --> trailing` and `<?php foo ?> trailing` now project as `RawBlock + Para [trailing]` instead of single oversized RawBlock. Parser-side fix in `parse_html_block_with_wrapper`: when the close marker (`-->` / `?>`) is followed by non-whitespace bytes on the close line under Pandoc dialect + bq_depth==0, splits the close `HTML_BLOCK_TAG` at the marker, finishes HTML_BLOCK, and grafts trailing + subsequent lines as sibling blocks via recursive `parse_with_refdefs`. Multi-line `<!--\n…\n--> trailing` and trailing-then-softbreak `<!-- hi --> trailing\nmore` shapes share the same path. Whitespace-only trailing keeps the legacy opaque HTML_BLOCK shape — projector-side `trim_end` to drop trailing whitespace from RawBlock text is a separate concern.)
386
387
388
389

# html-block (Comment / PI trailing-text split inside a blockquote: `> <!-- hi --> trailing`, `> <!-- multi\n> line --> trailing`, `> <?php ... ?> trailing`, and the nested `> > <!-- hi --> trailing` shape project as `BlockQuote [RawBlock, Para [trailing]]` (matching pandoc-native). Extends `try_parse_comment_pi_with_trailing_split`: removes the `bq_depth == 0` gate, finds the close marker in the bq-stripped inner content, emits the close `HTML_BLOCK_TAG` with bq-prefix tokens for lines past line 0 via `emit_bq_prefix_tokens`, then grafts the trailing recursive parse inside the surrounding `BLOCK_QUOTE` node (the dispatcher's wrapper).)
391
392
393
394

# html-block (Trailing-whitespace trim for RawBlock projection: `<!-- hi -->   \n` and `<pre>foo</pre>   \n` project as `RawBlock "<!-- hi -->"` and `RawBlock "<pre>foo</pre>"` (matching pandoc-native). Projector-side `emit_html_block` now trims trailing ASCII whitespace (newlines + spaces + tabs) from the collected text instead of only newlines. Interior whitespace is preserved.)
395
396

# html-inline (corpus pins for `<span>` inside remaining inline contexts: span in autolink (`<<span>https://…</span>>` projects `Str "<", Span, Str ">"`); span in image alt (`![<span>alt</span>](url)` lifts into `Figure/Image` content); span in ATX heading (`# Title with <span>inline</span>` → `Header [Str…, Span]`); span in link text (`[<span>link</span>](url)` → `Link [Span]`); span around emphasis (`<span>*emph*</span>` → `Span [Emph]` with emphasis NOT pairing into span); span in setext heading. All six already match pandoc-native; pure corpus expansion, no parser changes.)
397
398
399
400
401
402

# html-block (Leading 1-3 space indent strip from RawBlock first line: `  <pre>foo</pre>\n` (top level), `- text\n\n  <!-- hi -->` (list-item content_col), `- text\n\n  <pre>foo</pre>` (list-item), and multi-line `  <pre>\n  foo\n  </pre>` (top level — first line indent stripped, subsequent lines keep indent) now project as `RawBlock "<pre>foo</pre>"`/`RawBlock "<!-- hi -->"`/`RawBlock "<pre>\n  foo\n  </pre>"`. Projector-side `emit_html_block` strips the leading 1-3 spaces of the first line before emitting `RawBlock` text. Pandoc only recognizes HTML blocks at indent 0-3, so the strip is bounded. Top-level indented comments (`  <!-- hi -->\n` → pandoc emits `Para [RawInline]`, panache still emits `RawBlock`) remain a separate shape divergence — content matches now but the wrapper is wrong; not in corpus.)
403
404
405
406

# html-block (List-item indented Comment/PI with trailing text: `- text\n\n  <!-- hi --> trailing` and `- text\n\n  <?php foo ?> trailing` project as `BulletList [[Para, RawBlock, Para [trailing]]]`. Combined effect of the bq-extended `try_parse_comment_pi_with_trailing_split` (handles trailing-text split structurally) and the new leading-indent strip in `emit_html_block` (strips the 2-space list-item indent from RawBlock first line). Pure corpus pin — no new code beyond what already landed.)
407
408

# html-block (Indented `isInlineTag` constructs demote to `Para [RawInline]` under Pandoc: top-level `  <!-- hi -->`, `  <?php foo ?>`, `  <style>foo</style>`, `  <foo>` (Type7), `  </script>`, `  <video>x</video>` (inline-block matched-pair); list-item with EXTRA indent beyond content_col `- text\n\n   <!-- hi -->`; blockquote with extra indent `>   <!-- hi -->`. Parser-side gate in `HtmlBlockParser::detect_prepared`: when `is_pandoc && cannot_interrupt` and the line's leading-space count exceeds `list_indent_info.content_col`, returns `None` so paragraph parsing picks up the line and the inline parser handles the tag as `RawInline`. CommonMark keeps the RawBlock shape (block-level recognition). `<pre>`, `<script>` (regular), `<textarea>` are NOT in the demote set — they keep RawBlock with stripped indent via the existing leading-indent trim.)
409
410
411
412
413
414
415
416

# html-block (List-item Comment/PI with trailing text on the same item content: `- <!-- hi --> trailing`, `- <?php foo ?> trailing`, `- <!-- hi -->\n  trailing` project as `BulletList [[RawBlock, Plain [trailing]]]`; loose item `- <!-- hi --> trailing\n\n  more` projects as `BulletList [[RawBlock, Para [trailing], Para [more]]]`. `ListItemBuffer::emit_as_block`'s `try_emit_html_block_lift` accepts the 2-child reparse shape (HTML_BLOCK + PARAGRAPH from the existing Comment/PI trailing-split helper) and retags the trailing PARAGRAPH to PLAIN for tight items based on `use_paragraph`. Multi-line trailing past the same item (N>2 children) still falls through to the inline path — that's the same 0390 SoftBreak fusion gap.)
417
418
419
420

# html-block (Same-line `<div>foo</div>bar` and `<form>foo</form>bar` project as `Div + Para [bar]` and `RawBlock + Plain + RawBlock + Para [bar]` respectively (also bq, list-item, with-id variants). Parser same-line-lift gate now accepts trailing text after `</tag>`: `probe_same_line_lift` checks the close-marker is *contained* in the post-`>` slice (was: ends_with) while still requiring `count_tag_balance == (0, 1)`. The lift body splits the `try_split_close_line` close_part into close-marker bytes (`</tag>` only) and trailing bytes, emits the close `HTML_BLOCK_TAG` with just the marker, finishes the wrapper, then recursively parses the trailing+newline and grafts as siblings via `graft_document_children` (mirrors the existing Comment/PI trailing-split). The list-item buffer's `try_emit_html_block_lift` 2-child branch was widened to also accept `HTML_BLOCK_DIV + PARAGRAPH`. Negative-space pins for `>   <!-- hi --> trailing`, `> > <!-- hi -->`, `>>   <!-- hi -->` (already passing — pin against regression).)
421
422
423
424
425
426
427
428
429

# html-block (Depth-aware same-line and multi-line close-line lift. `probe_same_line_lift` and the same-line lift body now walk the post-`>` bytes depth-aware via `matched_close_offset`, accepting nested same-tag opens and unmatched trailing closes alongside the existing single-close shape. `<div>foo</div></div>` lifts to `Div [Plain[foo]] + RawBlock`; `<div><div>x</div></div>bar` lifts to `Div [Div [Plain[x]]] + Para[bar]` (nested div is recursively parsed). The multi-line close-line path (`<div>\nfoo</div></div>`, `<div>\nfoo\n</div>bar`) was widened to use the same depth-aware split + `split_close_marker_end` + trailing graft pattern that the same-line path already had. Unclosed `<div>` (`<div>`, `<div>\nfoo`, `<div>foo<bar>baz`) now projects as `Div [...]` per pandoc-native (warning: implicit close at EOF) — `div_has_structural_inner` accepts a missing close tag when the open tag is clean and the body has structural children with no `HTML_BLOCK_CONTENT`. All seven shapes previously panicked at the projector's `debug_assert!("HTML_BLOCK_DIV without structural inner shape")`.)
430
431
432
433
434
435
436
437

# html-block (Multi-line open with matched close on the open's last line — `<div\n  id="x">foo</div>`, `<div\n  id="x">foo</div></div>` (matched + unmatched trailing close), `<div\n  id="x"><div>x</div></div>` (nested), `<div\n  id="x">foo</div>bar` (trailing text), `<form\n  id="x">foo</form>` (strict-block variant). The depth-aware open-line balance (`depth <= 0` after summing `count_tag_balance` across all open-tag lines) now triggers the structural lift on `pre_content` (the bytes after the last open `>`, captured via `emit_multiline_open_tag_with_attrs(lift_trailing=true)`). The lift body mirrors the single-line `same_line_closed` path: `try_split_close_line_depth_aware` splits `pre_content` into leading body + close part, `split_close_marker_end` peels off the close marker bytes, the body is recursively parsed via `parse_with_refdefs`, and any same-line trailing text grafts as siblings of the wrapper via `graft_document_children`. Non-bq only — bq + multi-line + same-line close requires the bq-prefix re-injection machinery and is tracked as a follow-up.)
438
439
440
441
442

# html-block (Bq variant of the multi-line open + matched close-on-open's-last-line lift — `> <div\n>   id="x">foo</div>`, `> <div\n>   id="x">foo</div></div>` (trailing close → sibling RawBlock), `> <div\n>   id="x"><div>x</div></div>` (nested), `> <div\n>   id="x">foo</div>bar` (trailing text → sibling Para), `> <form\n>   id="x">foo</form>` (strict-block). New gate `bq_multiline_close_lift_tag` fires when `bq_depth > 0 && multiline_open_end.is_some() && depth <= 0` and joins `lift_mode`, threading `lift_trailing=true` through `emit_multiline_open_tag_with_attrs` (via the `bq_strict_attr_emit_tag_name` and HTML_BLOCK_DIV branches, both of which already accept `lift_mode` directly). The close-line lift gate widens from `bq_depth == 0` to `bq_depth == 0 || bq_multiline_close_lift_tag.is_some()`. Body and close inherit the bq prefix from the open's last line — already re-emitted by `emit_multiline_open_tag_with_attrs` for lines past `start_pos` — so no new BqPrefixState plumbing is needed; `emit_html_block_body_lifted` (with `bq: &mut None`) suffices. All five shapes previously panicked at `debug_assert!("HTML_BLOCK_DIV without structural inner shape")`.)
443
444
445
446
447

# html-block (Inline-block matched-pair multi-line-open + same-line close — `<video\n  src="x">body</video>`, `<iframe\n  src="x">body</iframe>` and bq variants `> <video\n>   src="x">body</video>` / `> <iframe\n>   src="x">body</iframe>`. Negative-space pin: these shapes already project correctly via the existing parser-side structural lift path (open `HTML_BLOCK_TAG` + PLAIN body + close `HTML_BLOCK_TAG`) without HTML_BLOCK_DIV retag — inline-block matched-pair stays as opaque `HTML_BLOCK` and the projector emits `RawBlock + Plain + RawBlock`. Pins parity with the existing `<div>` / `<form>` multi-line + same-line close coverage at 0438, 0442, 0443, 0447.)
448
449
450
451

# html-block (bq-in-listitem first-line HTML — `- > <div>foo</div>` (single-line) and `- > <div>\n  > hello\n  > </div>` (multi-line) now project as `BulletList [ BlockQuote [ Div [...] ] ]` matching pandoc-native. Previously the HTML-block dispatcher's `pandoc_html_open_tag_closes` walked raw `lines[line_pos]` and only knew how to strip `bq_depth` blockquote markers via `strip_n_blockquote_markers` — the leading list-marker bytes (`- `) on line 0 weren't part of its stripping vocabulary, so the open-tag scan saw `-` instead of `<` and the gate rejected the line, falling back to paragraph. Fix introduces a new `ContainerPrefix { list_content_col, bq_depth, list_marker_consumed_on_line_0 }` struct in `parser/blocks/container_prefix.rs` that bundles the dispatcher's per-line strip vocabulary; helpers `pandoc_html_open_tag_closes`, `find_multiline_open_end`, and `parse_html_block_with_wrapper` now take a `ContainerPrefix` argument instead of bare `bq_depth: usize`. `HtmlBlockParser::detect_prepared` and `parse_prepared` build `ContainerPrefix::from_ctx(ctx)` to thread the LIST_ITEM's `content_col`. The new `list_marker_consumed_on_line_0` flag distinguishes marker-line dispatch (where LIST_MARKER + WHITESPACE are upstream-emitted by `add_list_item` — used by `Parser::dispatch_bq_after_list_item`'s `parse_inner_content` path) from continuation-line dispatch (where the leading indent is inner content, not upstream prefix); only the former triggers list-marker stripping in `parse_html_block_with_wrapper`'s `strip_line_0_for_emission`. Lookahead helpers always strip list_content_col unconditionally since they're pure byte scans, not emission.)
452
453
