Metadata-Version: 2.3
Name: re-norm
Version: 0.1.0
Summary: Regex-based data extraction with typed normalization
Keywords: renorm,normalization,typed-regex
Author: Ricardo M.P. da Silva
Author-email: Ricardo M.P. da Silva <rmpdasilva@gmail.com>
License: Apache-2.0
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Filters
Classifier: Topic :: Utilities
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Programming Language :: Python :: 3.13
Classifier: Typing :: Typed
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 4 - Beta
Requires-Python: >=3.13
Project-URL: Homepage, https://shrubus.github.io/re-norm/
Project-URL: Repository, https://github.com/shrubus/re-norm
Project-URL: Issues, https://github.com/shrubus/re-norm/issues
Project-URL: License, https://github.com/shrubus/re-norm/blob/main/LICENSE
Description-Content-Type: text/markdown

# renorm

[![PyPI version](https://img.shields.io/pypi/v/re-norm.svg)](https://pypi.org/project/re-norm/)
[![Python versions](https://img.shields.io/pypi/pyversions/re-norm.svg)](https://pypi.org/project/re-norm/)
[![License](https://img.shields.io/pypi/l/re-norm.svg)](https://pypi.org/project/re-norm/)
![CI](https://github.com/shrubus/re-norm/actions/workflows/ci.yml/badge.svg)


<!-- docs:description:start -->

**Regex-based data extraction with typed normalization.**

`renorm` lets you embed small, reusable “spec” objects inside regular expressions.
Each spec defines both a regex fragment and a normalization function,
so matched values are returned as real Python types with zero post-processing.

Normalization depends on the exact structure of the matched text,
so extraction and normalization naturally belong together.
`renorm` follows this principle by treating each spec as a single unit of
pattern + normalization logic, making typed extraction composable and predictable.

<!-- docs:description:end -->

## Highlights

- Drop-in wrapper around Python’s `re` — mirrors the API and adds no dependencies.
- Built-in spec objects for common data types.
- Define your own specs with custom patterns and normalization logic.


## Basic Example

<!-- docs:basic_example:start -->

Instances of `renorm.Num` specify the thousand and decimal separators of plain number
literals to be captured and, therefore, how they should be normalized.

```python
import renorm as rn

eu = rn.Num(dec=",", ths=" ")
us = rn.Num(dec=".", ths="'")

pat = rn.compile(
    r"price=({@eu}); qty=({@us}); total=({@eu})",
    eu=eu,
    us=us,
)

m = pat.search("price=1 234,50; qty=2'000.0; total=2 469,00")
print(m.groups())  # (1234.5, 2000.0, 2469.0)
```

<!-- docs:basic_example:end -->

## Installation

<!-- docs:installation:start -->

```bash
pip install re-norm
```

<!-- docs:installation:end -->

## Features

- Extract **numeric literals** with built-in specs
- Normalize **custom data types** via user-defined specs
- Compose patterns using simple placeholders (`{@name}`)
- Zero dependencies: drop-in wrapper around Python’s `re`
- Minimal developer overhead: mirrors Python’s `re` API and semantics

## Using Specs in Regular Expressions

<!-- docs:using_specs:start -->

Specs are embedded in patterns using placeholders:

- `{@name}` for keyword specs
- `{@0}`, `{@1}`, … for positional specs

Each placeholder is replaced by the regex fragment defined by the spec.
If the placeholder appears inside a capturing group, the matched text is
passed through the spec’s normalization function.

This is the same placeholder syntax used in the basic example above,
where specs are passed as kwargs.

<!-- docs:using_specs:end -->

## Custom Specs

<!-- docs:custom_specs:start -->

You can define your own normalization rules by subclassing `rn.NormSpec`. A custom spec must implement a `pattern` property (regex fragment) and a `normalize` method. This allows `renorm` to support arbitrary data types and formats.

```python
class Hex(rn.NormSpec):
    @property
    def pattern(self):
        return r"[0-9A-Fa-f]+"

    def normalize(self, value: str):
        return int(value, 16)
```

```python
addr = Hex()
size = rn.Num()

pat = rn.compile(
    r"addr=0x({@addr}), size=({@size}) bytes",
    addr=addr,
    size=size,
)

m = pat.search("addr=0x1A2B, size=64 bytes")
print(m.groups())   # (6699, 64.0)
```

<!-- docs:custom_specs:end -->

## API Overview

<!-- docs:api_overview:start -->

Mirrors the familiar Python `re` API.

**``renorm.compile(pattern, *specs, **named_specs, flags) -> Pattern``**


**``renorm.Pattern``**

&emsp;&emsp;``.search(text) -> Match | None``

&emsp;&emsp;``.match(text) -> Match | None``

&emsp;&emsp;``.fullmatch(text) -> Match | None``

&emsp;&emsp;``.pattern -> str`` &nbsp; (the compiled regex string)


**``renorm.Match``**

&emsp;&emsp;``.group(0) -> str`` &nbsp; (full match)

&emsp;&emsp;``.group(i) -> T`` &nbsp; (normalized value)

&emsp;&emsp;``.group(i, j) -> tuple[T, str, None, ...]`` &nbsp; (normalized values)

&emsp;&emsp;``.groups() -> tuple[T, str, None, ...]`` &nbsp; (all normalized values)

&emsp;&emsp;``.groupdict()``  &nbsp; (disabled: reserved for internal use)

**``renorm.NormSpec``**

**``renorm.Num(dec, ths)``**

<!-- docs:api_overview:end -->

### Note

Some methods from the `re` API are not implemented yet. If you need one, just open an issue — happy to add it.
