Metadata-Version: 2.4
Name: unxml-rs
Version: 1.4.0
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Rust
Classifier: Topic :: Text Processing :: Markup :: XML
Classifier: Topic :: Text Processing :: Markup :: HTML
Classifier: Topic :: Utilities
Summary: Pretty-print XML and HTML files in a light, YAML-like, readable format
Keywords: xml,html,yaml,pretty-print,cli,parser
Author-email: Ville Vainio <vivainio@gmail.com>
License: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/vivainio/unxml-rs
Project-URL: Repository, https://github.com/vivainio/unxml-rs

# Unxml

Simplify and "flatten" XML files into a YAML-like readable format.

This is a Rust clone of the original [unxml](https://github.com/vivainio/unxml) F# tool.

**[See it in action →](https://vivainio.github.io/unxml-demos/)** — a gallery of
real-world XML documents, schemas, stylesheets, and Schematron rules rendered
with `unxml`, with original-vs-rendered size comparisons.

## Installation

### Using uv (Easiest)

Install the published wheel from PyPI as a standalone tool:

```bash
uv tool install unxml-rs
```

This puts the `unxml` command on your PATH. To try it without installing anything:

```bash
uvx --from unxml-rs unxml <xml_file>
```

### Pre-built Binaries (Recommended)

Download the latest release for your platform from the [GitHub Releases](https://github.com/yourusername/unxml-rs/releases) page:

- **Linux (x86_64)**: `unxml-linux-x86_64.tar.gz`
- **Windows (x86_64)**: `unxml-windows-x86_64.zip`
- **macOS (Intel)**: `unxml-macos-x86_64.tar.gz`
- **macOS (Apple Silicon)**: `unxml-macos-arm64.tar.gz`

Extract the archive and place the `unxml` binary in your PATH.

### From Source

```bash
git clone https://github.com/yourusername/unxml-rs
cd unxml-rs
cargo install --path .
```

### Using Cargo

```bash
cargo install unxml
```

## Usage

```bash
unxml <xml_file>
```

By default files render as plain XML. Pass `--auto` to pick the processing mode
from each file's extension:

| Extension      | Mode applied   |
| -------------- | -------------- |
| `.xsl` `.xslt` | `--xslt`       |
| `.sch`         | `--schematron` |
| `.xsd`         | `--xsd`        |

An explicit mode flag (`--xslt`, `--schematron`, `--xsd`, `--special`) always
overrides autodetection.

Each mode rewrites its vocabulary into a terser pseudocode. The full set of
transformations, with side-by-side samples, is documented per format:

- [XSLT transformations](docs/xslt.md) — `xsl:*` stylesheets
- [XSD transformations](docs/xsd.md) — `xs:*` / `xsd:*` schemas
- [Schematron transformations](docs/schematron.md) — `.sch` rule schemas

### Syntax-highlighted output (`--bat`)

```bash
unxml --bat some.xsd      # implies --auto (detects --xsd), pipes through `bat -l unxml`
```

`--bat` renders the output through [`bat`](https://github.com/sharkdp/bat) using
the bundled `unxml` grammar (see `editor/`) for paged, colourised display. If
`bat` is not installed it falls back to plain stdout.

### Hiding noisy namespace prefixes (`--hide-ns`)

Vocabularies like UBL bury the signal under repeated prefixes (`cbc:`, `cac:`).
`--hide-ns` drops the named prefixes from element names — and their `xmlns:`
declarations — so the output reads as bare local names:

```bash
unxml --hide-ns cbc,cac invoice.xml   # repeatable and comma-separated
```

Signal-carrying prefixes you don't list (e.g. `ext:`, `bim:`) are kept, so an
extension subtree still stands out.

Under `--auto`/`--bat`, unxml also **sniffs** the document type and hides a
sensible set automatically. Currently it recognises UBL *instance* documents
(an unprefixed root such as `<Invoice>` in a UBL namespace) and hides whichever
prefixes are bound to the Common Basic/Aggregate Components namespaces. A
stylesheet or schema that merely *references* UBL (e.g. an `xsl:stylesheet`
translating to UBL) is left untouched, since there the prefixes are real syntax.

## Introduction

This command line application was developed for comparing XML files (e.g. database/application state dumps). It takes an XML file and converts it to a YAML-like syntax that is easier to read and compare.

### Example

Given this XML input:

```xml
<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>netcoreapp2.1</TargetFramework>
    <PackAsTool>true</PackAsTool>
    <Description>Unxml 'pretty-prints' xml files in light, yamly, readable format</Description>
    <PackageVersion>1.0.0</PackageVersion>
  </PropertyGroup>
  <ItemGroup>
    <Compile Include="FileSystemHelper.fs"/>
    <Compile Include="MutableCol.fs"/>
    <Compile Include="Program.fs" />
  </ItemGroup>
</Project>
```

The output would be:

```
Project
  [Sdk]: Microsoft.NET.Sdk
  PropertyGroup
    OutputType = Exe
    TargetFramework = netcoreapp2.1
    PackAsTool = true
    Description = Unxml 'pretty-prints' xml files in light, yamly, readable format
    PackageVersion = 1.0.0
  ItemGroup
    Compile
      [Include]: FileSystemHelper.fs
    Compile
      [Include]: MutableCol.fs
    Compile
      [Include]: Program.fs
```

### Key Features

- **Attributes in Square Brackets**: Element attributes are displayed as `[attribute]: value`
- **Text Content with Equals**: Element text content is shown as `ElementName = text content`
- **Hierarchical Indentation**: Nested elements are properly indented
- **Clean Format**: Easy to read and compare, great for diffing
- **Inline mixed content**: Prose interleaved with short inline elements stays on one readable line

### Mixed content (prose with inline spans)

Document-style XML interleaves text with small inline elements — a paragraph
containing a `<command>` or a `<link>`. Flattening every run onto its own line
makes such prose hard to read, so `unxml` keeps it inline as one line of
verbatim XML:

```xml
<para>The <command>widget</command> daemon keeps its
  <link href="recovery.html">recoverable</link> state in one database.</para>
```

renders as:

```
para = The <command>widget</command> daemon keeps its <link href="recovery.html">recoverable</link> state in one database.
```

An element flows inline when its whole subtree is *inline-safe* — text
interleaved with elements that are themselves inline-safe. A leaf with
significant (multi-line) text, such as `<programlisting>` or `<screen>`, is not
inline-safe, so its parent stays in the flattened block form and the listing
keeps its line breaks. Nested inline markup (e.g. `<emphasis>` wrapping a
`<command>`) collapses all the way up. This applies to the generic XML render;
the `--xslt`/`--xsd`/`--wsdl`/`--schematron` modes use their own formatting.

## Technical Details

- Built with Rust for performance and safety
- Uses `quick-xml` for fast XML parsing
- Uses `clap` for command-line argument parsing
- Proper error handling with `anyhow`

## License

MIT License - see LICENSE file for details.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

### Creating Releases

The version lives in the **git tag**, not in `Cargo.toml` (which stays at the
`0.0.0-dev` placeholder; the release workflow injects the real version with
`cargo set-version`). Do **not** bump `Cargo.toml` or create tags by hand.

To cut a release, let `gh` create the tag:

```bash
gh release create vX.Y.Z --title "Release vX.Y.Z" --notes "…"
```

The pushed tag triggers the GitHub Actions workflow, which builds binaries and
the PyPI wheel for all platforms and attaches them to the release.

The CI workflow runs on every push to ensure code quality with formatting checks, linting, and tests. 
