Metadata-Version: 2.4
Name: htmltree-view
Version: 0.2.1
Summary: Visualize HTML DOM structure as a depth-limited, colorized ASCII tree
License: MIT
Project-URL: Homepage, https://github.com/cumulus13/htmltree
Project-URL: Repository, https://github.com/cumulus13/htmltree
Project-URL: Issues, https://github.com/cumulus13/htmltree/issues
Keywords: html,dom,tree,visualizer,beautifulsoup,cli,debug,structure,ascii
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Markup :: HTML
Classifier: Topic :: Utilities
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: beautifulsoup4>=4.12
Provides-Extra: lxml
Requires-Dist: lxml; extra == "lxml"
Provides-Extra: html5lib
Requires-Dist: html5lib; extra == "html5lib"
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: lxml; extra == "dev"
Requires-Dist: html5lib; extra == "dev"
Dynamic: license-file

# htmltree-view

> Visualize HTML DOM structure as a **depth-limited, colorized ASCII tree** — like the `tree` command, but for HTML files.

```
<html> lang="en"  [ L0 2ch ]
├── <head>  [ L0 4ch ]
│   ├── <meta> charset="utf-8"  [ L1 empty ]
│   ├── <meta> name="viewport" content="width=device-width"  [ L1 empty ]
│   ├── <title>  [ L1 empty ]
│   │   └── "My Page"
│   └── <link> rel="stylesheet" href="style.css"  [ L1 empty ]
└── <body>  [ L1 3ch ]
    ├── <header>  [ L2 2ch ]
    │   └── … (2 children hidden)
    ├── <main> id="main-content"  [ L2 2ch ]
    │   └── … (2 children hidden)
    └── <footer>  [ L2 2ch ]
        └── … (2 children hidden)

────────────────────────────────────────────────────
  Tags: 8  Text nodes: 1  Max depth: 2 (capped at 2)
  Top tags: meta×2, html×1, head×1, title×1, link×1
```

## Features

- **Depth limiting** — `-d N` stops at level N; truncated sub-trees show a `… (X children hidden)` hint
- **CSS selector zoom** — `-s "#app"` or `-s "body > main"` focuses any sub-tree
- **Semantic tag colors** — headings in amber, structural in blue, forms in pink, links in cyan, etc.
- **Depth-cycling pipe colors** — guide lines change shade per nesting level
- **`[L3 5ch]` badges** — depth level + direct child-tag count on every node
- **Text nodes** — quoted inline, with `--text-limit` truncation and whitespace collapsing
- **Attribute filtering** — `--attrs id class href` shows only what you care about; `--attrs` hides all
- **Attribute value truncation** — `--attr-limit 80` prevents base64/data-URI blowout
- **HTML comments** — hidden by default, shown with `--show-comments`
- **URL fetching** — `htmltree https://example.com -d 3`
- **stdin pipe** — `curl ... | htmltree -` or `echo '<div/>' | htmltree -`
- **Output to file** — `-o tree.txt` (auto-disables color)
- **Auto color detection** — ANSI disabled when stdout is not a TTY; respects `NO_COLOR` / `FORCE_COLOR` env vars
- **Streaming output** — `iter_lines()` yields one line at a time; never builds the full string unless you ask
- **No recursion** — iterative DFS walk; handles arbitrarily deep HTML without `RecursionError`
- **Stats summary** — total tags, text nodes, comments, max depth seen, top-5 tag frequencies

## Install

```bash
pip install htmltree-view

# With faster lxml parser:
pip install "htmltree-view[lxml]"

# With html5lib (most spec-accurate):
pip install "htmltree-view[html5lib]"
```

## CLI

```bash
# Full tree
htmltree index.html

# Limit depth to 3 levels
htmltree index.html -d 3

# Focus on a CSS-selected sub-tree
htmltree index.html -s "body > main"
htmltree index.html -s "#app"
htmltree index.html -s ".container"

# Fetch from URL
htmltree https://example.com -d 4

# Read from stdin
curl https://example.com | htmltree -
echo '<div><p>hi</p></div>' | htmltree -

# Show only id and class attributes
htmltree index.html --attrs id class

# Hide all attributes
htmltree index.html --attrs

# Hide text nodes (structure only)
htmltree index.html --no-text

# Show HTML comments
htmltree index.html --show-comments

# Truncate text/attr at 40 chars
htmltree index.html --text-limit 40 --attr-limit 40

# Save to file (color auto-disabled)
htmltree index.html -o structure.txt

# Pipe to less with color preserved
htmltree index.html --force-color | less -R

# Use lxml backend (faster)
htmltree index.html --parser lxml

# Plain output (no ANSI)
htmltree index.html --no-color
```

## Python API

```python
from htmltree import HtmlTree

html = open("index.html").read()

# Basic usage
tree = HtmlTree(html)
tree.print()

# Limit depth, filter attributes
tree = HtmlTree(html, max_depth=3, show_attrs=["id", "class"])
tree.print()

# Zoom into a sub-tree
tree = HtmlTree(html, max_depth=5, show_text=False)
tree.print(root_selector="body > main")

# Render to string
tree = HtmlTree(html, max_depth=2, force_color=False)
output = tree.render(root_selector="body")
print(output)

# Stream line by line (memory-efficient for large pages)
tree = HtmlTree(html, max_depth=4)
for line in tree.iter_lines(root_selector="#content"):
    print(line)

# Access stats after render
tree.render()
print(tree.stats.total_tags)
print(tree.stats.tag_counts)      # dict: tag name → count
print(tree.stats.max_depth_seen)
print(tree.stats.total_text_nodes)
print(tree.stats.total_comments)
```

## CLI reference

| Flag | Default | Description |
|------|---------|-------------|
| `SOURCE` | — | HTML file path, http/https URL, or `-` for stdin |
| `-d N` / `--depth N` | unlimited | Max depth; negatives clamped to 0 |
| `-s CSS` / `--selector CSS` | `<html>` | CSS selector for tree root |
| `--attrs [NAME …]` | all | Attributes to show; no names = hide all |
| `--no-text` | off | Hide text nodes |
| `--show-comments` | off | Show HTML comment nodes |
| `--text-limit N` | 60 | Max chars per text node |
| `--attr-limit N` | 80 | Max chars per attribute value |
| `--no-color` | off | Disable ANSI colors |
| `--force-color` | off | Force colors even when piped |
| `--no-summary` | off | Suppress stats footer |
| `-o FILE` / `--output FILE` | stdout | Write to file |
| `--parser BACKEND` | `html.parser` | `html.parser`, `lxml`, `html5lib` |
| `--version` | — | Print version and exit |

## Tree legend

| Symbol | Meaning |
|--------|---------|
| `[L3]` | Node is at depth 3 |
| `[5ch]` | 5 direct tag children |
| `[empty]` | No children |
| `"text"` | Text node content (may be truncated) |
| `<!-- … -->` | HTML comment (with `--show-comments`) |
| `… (N children hidden)` | Sub-tree cut at depth limit |

## Environment variables

| Variable | Effect |
|----------|--------|
| `NO_COLOR` | Any non-empty value disables ANSI colors (https://no-color.org/) |
| `FORCE_COLOR` | Any non-empty value forces ANSI colors even when piped |

## Requirements

- Python ≥ 3.8
- `beautifulsoup4 ≥ 4.12`
- Optional: `lxml`, `html5lib`

## License

[MIT](LICENSE)

## 👤 Author
        
[Hadi Cahyadi](mailto:cumulus13@gmail.com)
    

[![Buy Me a Coffee](https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png)](https://www.buymeacoffee.com/cumulus13)

[![Donate via Ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/cumulus13)
 
[Support me on Patreon](https://www.patreon.com/cumulus13)
