Metadata-Version: 2.3
Name: html-table-parse
Version: 0.2.1
Summary: Parse HTML table as Python list or dict
Author: 5j9
Author-email: 5j9 <5j9@users.noreply.github.com>
License: GPL-3.0
Requires-Dist: lxml>=6.1.1
Requires-Python: >=3.14
Project-URL: Homepage, https://github.com/5j9/html-table-parse
Description-Content-Type: text/markdown

# HTML Table Parse

A lightweight HTML table parser that converts tables to Python data structures without pandas.

## Installation

```bash
pip install html-table-parse
```

## Usage

```python
from html_table_parse import to_list, to_dict, to_dicts

html = """
<table>
    <tr><th>Name</th><th>Age</th><th>City</th></tr>
    <tr><td>Alice</td><td>30</td><td>NYC</td></tr>
    <tr><td>Bob</td><td>25</td><td>LA</td></tr>
</table>
"""

# List of lists
to_list(html)
# [['Name', 'Age', 'City'], ['Alice', '30', 'NYC'], ['Bob', '25', 'LA']]

# Dictionary of columns
to_dict(html)
# {'Name': ['Alice', 'Bob'], 'Age': ['30', '25'], 'City': ['NYC', 'LA']}

# List of dictionaries
to_dicts(html)
# [{'Name': 'Alice', 'Age': '30', 'City': 'NYC'}, 
#  {'Name': 'Bob', 'Age': '25', 'City': 'LA'}]
```

## Features

- No pandas required - lightweight alternative to `pandas.read_html()`
- Supports `colspan` and `rowspan` attributes
- Handles duplicate headers (auto-numbered)
- Multiple output formats: lists, dict of columns, or list of dicts
- Automatic whitespace normalization
- Fast parsing with ```lxml```

## API

### `to_list(html: str, index: int = 0) -> list[list]`

Parse table as list of rows.

### `to_dict(html: str, index: int = 0) -> dict[str, list]`

Parse table as dictionary of columns (first row = headers).

### `to_dicts(html: str, index: int = 0) -> list[dict]`

Parse table as list of dictionaries (first row = headers).
