Metadata-Version: 2.4
Name: latexport
Version: 2026.3.0
Summary: Convert LaTeX documents to accessible HTML with PDF output
Author-email: Kana Nadarajan <clean.wood0319@fastmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/kalv25/texport
Project-URL: Repository, https://github.com/kalv25/texport
Project-URL: Issues, https://github.com/kalv25/texport/issues
Project-URL: Changelog, https://github.com/kalv25/texport/blob/main/CHANGELOG.md
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: beautifulsoup4>=4.14.3
Dynamic: license-file

# texport

A workflow for converting LaTeX documents to accessible HTML with PDF output.

## Overview

This project converts `.tex` files into web-ready HTML pages using [LaTeXML](https://dlmf.nist.gov/LaTeXML/), while also generating PDF versions via `pdflatex`. The HTML output is customised with additional CSS, JavaScript, and accessibility enhancements.

## Project Structure

```
texport/
├── static/            # Shared CSS/JS assets (source of truth)
│   ├── css/
│   │   └── custom.css
│   └── js/
│       ├── custom.js
│       └── mathjax-config.js
├── latexml/           # Custom LaTeXML bindings (.ltxml); all loaded automatically
│   ├── amsmath-compat.ltxml
│   └── emph-in-math.ltxml
├── output/            # Generated output (seeded from static/ on each run)
│   ├── css/           # Copied from static/css/
│   ├── js/            # Copied from static/js/
│   ├── index.html     # Generated by texport-index
│   └── {document}/    # Per-document output
│       ├── index.html
│       └── {document}.pdf
├── templates/         # HTML templates
├── main.py            # Main processing script
├── create_main_index.py  # Index page generator
├── embed_assets.py    # Self-contained HTML bundler
└── config.py          # Configuration settings
```

## Prerequisites

### Python 3.12+ and uv

Install [uv](https://docs.astral.sh/uv/), which manages the Python version and dependencies:

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

### LaTeXML

LaTeXML converts `.tex` files to HTML5.

**macOS:**
```bash
brew install latexml
```

**Ubuntu / Debian:**
```bash
sudo apt install latexml
```

**Other:** see the [LaTeXML installation docs](https://dlmf.nist.gov/LaTeXML/get.html).

### TeX distribution (pdflatex)

A TeX distribution provides `pdflatex`, used to produce PDF output.

**macOS:**
```bash
brew install --cask mactex-no-gui
```

**Ubuntu / Debian:**
```bash
sudo apt install texlive-latex-base
```

**Already have TeX Live?** Install only `pdflatex` via tlmgr:
```bash
tlmgr install pdftex
```

`bibtex` is included with most TeX distributions. For `biber` (used with biblatex):
```bash
tlmgr install biber
```
Both are optional — texport auto-detects whether they are needed based on the source file.

## Installation

```bash
# Clone the repository
git clone <repository-url>
cd texport

# Install Python dependencies and register CLI commands
uv sync
uv pip install -e .
```

## Usage

### 1. Process LaTeX Files

Convert one or more `.tex` files to HTML and PDF:

```bash
# Process a single file
uv run texport tex_src/example.tex

# Process multiple files
uv run texport tex_src/file1.tex tex_src/file2.tex

# Write output to a custom directory instead of output/
uv run texport -o ./public tex_src/example.tex

# Override the output subdirectory name (single file only)
uv run texport --name lecture-notes tex_src/example.tex
# → output goes to output/lecture-notes/ instead of output/example/

# Dry run (preview without changes)
uv run texport -n tex_src/example.tex
```

This will:
- Seed the output directory with shared assets from `static/`
- Auto-detect whether bibliography processing (`bibtex`/`biber`) is needed
- If `\cite` commands are present: run bibtex/biber before LaTeXML so citations resolve in HTML
- Generate HTML at `{output}/{stem}/index.html` (via LaTeXML, with all `latexml/*.ltxml` bindings)
- Generate PDF at `{output}/{stem}/{stem}.pdf` (via pdflatex, with bibtex/biber if needed)
- Clean up auxiliary files (`.aux`, `.log`, `.out`, `.bbl`, `.blg`, `.bcf`, `.run.xml`)
- Remove empty subdirectories left by pdflatex's `\include` handling
- Inject custom CSS and JavaScript references
- Replace QED symbols with accessible HTML
- Consolidate local CSS files to the shared `css/` folder

### 2. Generate Main Index Page

Create an index page listing all documents:

```bash
# Use the default output directory (from config.py)
uv run texport-index

# Use a custom output directory
uv run texport-index -o examples/output
```

This scans the output directory for `index.html` files and generates a main index with links to each document (and PDF if available).

### 4. Clean Up Log Files

Remove `latexml.log` files left behind by LaTeXML:

```bash
uv run texport-clean
```

This removes `latexml.log` from the current directory and recursively from the output directory. During a normal `texport` run these are cleaned up automatically; `texport-clean` handles any leftovers from previous runs.

### 3. Bundle a Self-Contained HTML File

Inline all CSS and JS into a single portable file:

```bash
# Bundle with all assets inlined (CSS + JS) — default behaviour
uv run embed_assets.py output/example/index.html

# Bundle but skip remote assets (they remain as external references)
uv run embed_assets.py --skip-remote output/example/index.html

# Bundle CSS only — leave <script src> tags untouched
uv run embed_assets.py --skip-js output/example/index.html

# Write the bundled file to a custom path
uv run embed_assets.py output/example/index.html dist/standalone.html
```

## Configuration

Edit `config.py` to customise paths and settings:

```python
OUTPUT_DIR = Path("./output")    # Root directory for generated output
STATIC_DIR = Path("./static")    # Shared CSS/JS source; copied into output on each run
LATEXML_DIR = Path(__file__).parent / "latexml"  # LaTeXML binding files (absolute path)
SRC_QED_SYMBOL = "∎"             # QED symbol to replace in HTML
ENCODING = "utf-8"               # File encoding

# Index generator settings
ROOT_DIR = OUTPUT_DIR
PATTERN = "index.html"
TEMPLATE_PATH = Path("templates/main_index_template.html")
```

## Examples

### Single standalone file — `testmath.tex`

Source: [`latex3/latex2e`](https://github.com/latex3/latex2e) — © American Mathematical Society / LaTeX Project, [LPPL 1.3c](https://www.latex-project.org/lppl/lppl-1-3c).

A self-contained file with no `\include` dependencies. The stem is overridden
so the output folder has a descriptive name rather than the generic `testmath`.

```bash
uv run texport \
  -o examples/output \
  --name latex2e-testmath \
  examples/tex_src/testmath.tex
```

Output:
```
examples/output/latex2e-testmath/index.html
examples/output/latex2e-testmath/testmath.pdf
```

---

### Multi-part document — `hermish-proofs-notes/main.tex`

Source: [`hermish/proofs-notes`](https://github.com/hermish/proofs-notes) — CS70 lecture notes by Hermish Mehta.

A document split across multiple files via `\include`. texport creates the
required subdirectories for pdflatex, then removes them once they are empty
after aux file cleanup.

```bash
uv run texport \
  -o examples/output \
  --name hermish-proofs-notes \
  examples/tex_src/hermish-proofs-notes/main.tex
```

Output:
```
examples/output/hermish-proofs-notes/index.html
examples/output/hermish-proofs-notes/main.pdf
```

---

### Generate the main index

After converting one or more documents, build the navigable index page:

```bash
uv run texport-index -o examples/output
```

This scans `examples/output/` and writes `examples/output/index.html` with
links to each document (and its PDF where available).

---

## Typical Workflow

1. **Write LaTeX** — Create/edit `.tex` files in `tex_src/`
2. **Convert to HTML/PDF** — Run `uv run texport tex_src/yourfile.tex`
3. **Regenerate index** — Run `uv run texport-index`
4. **Deploy** — Upload `output/` to your web server

## Customisation

### Custom CSS

Edit `static/css/custom.css`. This file is automatically copied into the output directory and injected into every processed HTML file.

### Custom JavaScript

Edit files in `static/js/`. The following are automatically injected:
- `custom.js` — Page-width slider, MathJax toggle, go-to-top button
- `mathjax-config.js` — MathJax configuration

#### Localising the toolbar (`custom.js`)

All user-visible strings in the toolbar are read from `window.texportI18n`.
To override them for another language, add a `<script>` block **before** `custom.js` loads:

```html
<script>
  window.texportI18n = {
      widthLabel:     'Breite',
      widthAriaLabel: 'Seitenbreite in ch-Einheiten',
      mathOn:         'Formel ✓',
      mathOff:        'Formel ✗',
      mathAriaOn:     'MathJax-Darstellung ein',
      mathAriaOff:    'MathJax-Darstellung aus',
      goToTopAria:    'Zum Seitenanfang',
  };
</script>
```

Only the keys you want to change need to be provided; omitted keys fall back to the English defaults.

### LaTeXML Bindings

Custom LaTeXML behaviour is defined in `.ltxml` files inside `latexml/`. These are Perl modules loaded via `--preload` on every `latexmlc` invocation. **All `.ltxml` files in `latexml/` are loaded automatically** (alphabetical order) — no changes to `main.py` needed when adding new ones.

Currently included:
- `amsmath-compat.ltxml` — no-op stubs for amsmath internal commands (e.g. `\ctagsplit@true`) that would otherwise cause "undefined macro" errors.
- `emph-in-math.ltxml` — redefines `\emph{…}` as `\mathit{…}` inside math environments, `\textit{…}` elsewhere.

To add a new binding, simply create a `.ltxml` file in `latexml/`.

### Index Template

Edit `templates/main_index_template.html`. The template uses Python `str.format`-style placeholders:

| Placeholder | Default | Description |
|---|---|---|
| `{lang}` | `en` | `<html lang>` attribute |
| `{title}` | `Documents` | `<title>` and `<meta name="description">` |
| `{description}` | `Document index` | Meta description content |
| `{heading}` | `Documents` | `<h1>` text |
| `{contents_label}` | `Contents` | `<h3>` section label |
| `{links}` | *(generated)* | Rendered `<li>` elements — filled automatically |

To generate the index in another language, pass keyword arguments to `create_main_index_page`:

```python
create_main_index_page(
    root_dir=Path("output"),
    lang="de",
    title="Dokumente",
    description="Dokumentenindex",
    heading="Dokumente",
    contents_label="Inhalt",
)
```

## Caveats

### SVG Dark Mode

SVG images (e.g., diagrams generated by TikZ) use a simple CSS filter to invert colours in dark mode. This works well for simple black-and-white diagrams but may produce unexpected results when multiple colours are used. Always test your documents in dark mode to verify SVG rendering.

### LaTeXML Conversion Limitations

LaTeXML does not support all LaTeX packages and document structures. Known cases where HTML conversion fails or produces degraded output:

**Multi-part documents** — Projects where the root `.tex` file relies on a custom build system, non-standard `\include` chaining, or shared preamble files split across multiple directories may not convert correctly. LaTeXML resolves includes relative to `--sourcedirectory`; files outside that tree are not found.

**`memoir` class** — Documents using the `memoir` document class are not reliably converted. LaTeXML has limited support for `memoir`'s extended sectioning, captioning, and page-layout commands. For example, the [UiO Introduction to LaTeX](https://github.com/uio-latex/Introduction-to-LaTeX) repository uses `memoir` and fails to produce usable HTML output.

In these cases pdflatex still produces a correct PDF; only the HTML output is affected. Consider restructuring such documents to use a standard class (`article`, `report`, `book`) for full LaTeXML compatibility.

## Contributing

Contributions are welcome — see [CONTRIBUTING.md](CONTRIBUTING.md) for setup instructions, code style, and how to submit a pull request.

## License

MIT — see [LICENSE](LICENSE).
