Metadata-Version: 2.4
Name: matlab-publish-parser
Version: 2026.0.0
Summary: The package parses the xml output of publish in MATLAB,.
Keywords: matlab,xml,parser
Author: Gustaf Hendeby
Author-email: Gustaf Hendeby <gustaf.hendeby@liu.se>
License-Expression: GPL-3.0-or-later
License-File: LICENSE.md
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python :: 3
Requires-Dist: defusedxml>=0.7.1
Requires-Dist: pip>=25.1
Requires-Python: >=3.13
Description-Content-Type: text/markdown

This package is used to parse the xml output of MATLAB publish, and
this way facilitate deriving more advanced documentation that contain
MATLAB code with matching output and figures.

# Usage

## MATLAB processing

Write the code that you want to create output from as a MATLAB publish
document.  The package will parse the code, and separate the different
cells.  The package will separate documentation, executed code, console
output and figures.

In MATLAB process the created file (for example ``cubic_plot_publish.m``) using publish:
```
publish('cubic_plot_publish.m',...
        'format', 'xml',...
        'imageFormat', 'epsc',...
        'outputDir', 'gen/',...
        'catchError', false);  % Fail on thrown exception

```
This will produce the file ``gen/cubic_plot_publish.xml``

## Python Processing
Usage:

```
from pathlib import Path

import matlab_publish_parser as mpp

mf = mpp.parse(Path('cubic_plot_publish.xml'))

print(mf.filename)  # The name of the processed file
print(mf.output_dir)  # Directory for generated files
for c in mf.cells:
  print(c.title)  # Title of cell
  print(c.code)  # Executed code in cell
  print(c.output)  # Console output from cell
  ...

```

Explore the object to figure out what has been extracted.

Current fixture examples in `tests/data/` are:
`cubic_plot_publish.m`, `formatted_text_publish.m`, `text_only_publish.m`,
`single_figure_publish.m`, and `multi_figure_publish.m`.

## HTML → LaTeX conversion

`Cell.text_as_latex()` converts the small HTML fragment that MATLAB
`publish` emits inside each `<text>` element into LaTeX.  The conversion is
performed by a tiny in-tree module,
`matlab_publish_parser.html_to_latex`, rather than a third-party
library.

**Rationale.**  Earlier versions depended on the unmaintained
`html2latex` package, installed straight from a personal GitHub fork.
The package is not on PyPI, has not tracked changes in its own
dependencies (notably `justhtml`), and pulls in a heavy HTML-parsing
stack to handle constructs MATLAB publish never produces.  General
HTML-to-LaTeX libraries (e.g. routing through Pandoc via `pypandoc`)
were considered, but they add either a non-Python runtime dependency
or substantially more surface area than this project needs.

The MATLAB-publish HTML subset is narrow and stable — paragraphs, basic
inline emphasis (`<b>`, `<i>`, `<tt>`), simple lists, links, line
breaks, preformatted/verbatim blocks, inline and display LaTeX
equations, external images, raw LaTeX blocks, and (dropped) raw HTML
blocks — so a small purpose-built converter is both easier to test and
easier to keep working.  The set of tags handled was derived directly
from real MATLAB `publish('...','format','xml')` output (see
`tests/data/all_markup_publish.m`).

**MATLAB-specific notes.**

* `<equation>` is emitted by `publish` with the original LaTeX source
  preserved on either the `<equation text="$$...$$">` attribute (display
  equations) or on the inner `<img alt="$...$">` (inline equations).
  The converter prefers those over re-deriving math from rendered
  pixels.
* `<latex>` blocks contain raw LaTeX (e.g. `\begin{tabular}...`) and are
  emitted **verbatim**, not wrapped in math delimiters.
* `<html>` blocks have no faithful LaTeX equivalent and are dropped
  with a `UserWarning`.
* `<mcode-xmlized>` (sample code with a three-space comment indent) is
  rendered as a `\begin{verbatim}...\end{verbatim}` block, stripping
  the namespaced `mwsh:*` syntax-highlight tags.
* MATLAB emits the trademark symbols `®` / `™` as literal Unicode, which
  passes through the converter unchanged.

**Extending the converter.**  The converter dispatches each HTML tag
through a registry of handlers.  To support a new tag without forking
the module, register a handler on the default renderer:

```python
from matlab_publish_parser.html_to_latex import HtmlToLatex, register_tag

def _underline(element, renderer: HtmlToLatex) -> str:
    return f"\\underline{{{renderer.render_element(element)}}}"

register_tag("u", _underline)
```

A handler receives the `xml.etree.ElementTree.Element` it is rendering
and the active renderer (so it can recurse via
`renderer.render_element` / `renderer.render_children`).  For
private rendering pipelines that should not mutate the module-level
default, instantiate `HtmlToLatex()` directly and call its
`register` / `convert` methods.

Unknown tags do **not** raise: they render their children transparently
and emit a `UserWarning`.  This keeps existing pipelines working when
MATLAB introduces new constructs, while still surfacing them so a
handler can be added.

## Testing

Run the test suite with:

```bash
pytest
```

Optionally run tests with coverage reporting:

```bash
pytest --cov=matlab_publish_parser --cov-branch --cov-report=term-missing
```

When you change a MATLAB source fixture in `tests/data/*.m`, regenerate the
matching published XML fixtures in `tests/data/gen/*.xml` by running
`tests/data/runme.m` in MATLAB before running the tests again. The suite checks
both that every source fixture has a matching XML file and that the XML
`originalCode` still matches the current MATLAB source.

Output rendering expectations are stored as snapshots in `tests/snapshots/`.
This includes per-cell rendering snapshots and full parsed-file snapshots for
all published examples. When output formatting intentionally changes, refresh
snapshots with:

```bash
pytest --update-output-snapshots
```

When adding a new example fixture:

```text
1. Add the `.m` file to `tests/data/`
2. Add it to `tests/data/runme.m`
3. Generate the matching XML in `tests/data/gen/`
4. Run `pytest --update-output-snapshots` if the parsed output changed intentionally
```
