Webdown¶
A Python CLI tool for converting web pages to clean, readable Markdown format. Webdown makes it easy to extract content from websites for documentation, notes, content migration, or offline reading.
I made this tool specifically so I could download documentation, convert it to Markdown and feed it into an LLM coding tool.
Why Webdown?¶
- Clean Conversion: Produces readable Markdown without formatting artifacts
- Selective Extraction: Target specific page sections with CSS selectors
- Customization Options: Control links, images, text wrapping, and more
- Progress Tracking: Visual download progress for large pages with
-pflag - Python Integration: Use as a CLI tool or integrate into your Python projects
Use Cases¶
Documentation for AI Coding Assistants¶
Webdown is particularly useful for preparing documentation to use with AI-assisted coding tools like Claude Code, GitHub Copilot, or ChatGPT:
- Convert technical documentation into clean Markdown for AI context
- Extract only the relevant parts of large documentation pages using CSS selectors
- Strip out images and formatting that might consume token context
- Generate well-structured tables of contents for better navigation
- Batch process API documentation for library-specific assistance
# Example: Convert API docs and store for AI coding context
webdown https://api.example.com/docs -s "main" -I -c -w 80 -o api_context.md
Installation¶
From PyPI¶
Install from Source¶
# Clone the repository
git clone https://github.com/kelp/webdown.git
cd webdown
# Install with pip
pip install .
# Or install with Poetry
poetry install
Usage¶
Basic usage:
Output to stdout:
Options¶
-o, --output: Output file (default: stdout)-t, --toc: Generate table of contents-L, --no-links: Strip hyperlinks-I, --no-images: Exclude images-s, --css SELECTOR: CSS selector to extract specific content-c, --compact: Remove excessive blank lines from the output-w, --width N: Set the line width for wrapped text (0 for no wrapping)-p, --progress: Show download progress bar
Advanced Options:
--single-line-break: Use single line breaks instead of two line breaks--unicode: Use Unicode characters instead of ASCII equivalents--tables-as-html: Keep tables as HTML instead of converting to Markdown--emphasis-mark CHAR: Character(s) to use for emphasis (default: '_')--strong-mark CHARS: Character(s) to use for strong emphasis (default: '**')
Examples¶
Generate markdown with a table of contents:
Extract only main content:
Strip links and images:
Compact output with progress bar and line wrapping:
For complete documentation, use the --help flag:
Documentation¶
API documentation is available online at tcole.net/webdown.
You can also generate the documentation locally with:
make docs # Generate HTML docs in the docs/ directory
make docs-serve # Start a local documentation server at http://localhost:8080
Development¶
Prerequisites¶
- Python 3.10+ (3.13 recommended)
- Poetry for dependency management
Setup¶
# Clone the repository
git clone https://github.com/kelp/webdown.git
cd webdown
# Install dependencies with Poetry
poetry install
poetry run pre-commit install
# Optional: Start a Poetry shell for interactive development
poetry shell
Development Commands¶
We use a Makefile to streamline development tasks:
# Install dependencies
make install
# Run tests
make test
# Run tests with coverage
make test-coverage
# Run integration tests
make integration-test
# Run linting
make lint
# Run type checking
make type-check
# Format code
make format
# Run all pre-commit hooks
make pre-commit
# Run all checks (lint, type-check, test)
make all-checks
# Build package
make build
# Start interactive Poetry shell
make shell
# Generate documentation
make docs
# Start documentation server
make docs-serve
# Publishing to PyPI (maintainers only)
# See CONTRIBUTING.md for details on the release process
make build # Build package
make publish-test # Publish to TestPyPI (for testing)
# Show all available commands
make help
Poetry Commands¶
You can also use Poetry directly:
# Start an interactive shell in the Poetry environment
poetry shell
# Run a command in the Poetry environment
poetry run pytest
# Add a new dependency
poetry add requests
# Add a development dependency
poetry add --group dev black
# Update dependencies
poetry update
# Build package
poetry build
Python API Usage¶
Webdown can also be used as a Python library in your own projects:
from webdown.converter import convert_url_to_markdown, WebdownConfig
# Method 1: Basic conversion with individual parameters
markdown = convert_url_to_markdown("https://example.com")
# Method 1: With all options as parameters (original style)
markdown = convert_url_to_markdown(
url="https://example.com",
include_links=True,
include_images=True,
include_toc=True,
css_selector="main", # Only extract main content
compact_output=True, # Remove excessive blank lines
body_width=80, # Wrap text at 80 characters
show_progress=True # Show download progress bar
)
# Method 2: Using the Config object (new in 0.3.1)
config = WebdownConfig(
# Basic options
url="https://example.com",
include_toc=True,
css_selector="main",
compact_output=True,
body_width=80,
show_progress=True,
# Advanced options (all optional)
single_line_break=False,
unicode_snob=True, # Use Unicode characters
tables_as_html=False,
emphasis_mark="_",
strong_mark="**"
)
markdown = convert_url_to_markdown(config)
# Save to file
with open("output.md", "w") as f:
f.write(markdown)
Contributing¶
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Run tests to make sure everything works:
- Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Please make sure your code passes all tests, type checks, and follows our coding style (enforced by pre-commit hooks). We aim to maintain high code coverage (currently at 93%). When adding features, please include tests.
For more details, see CONTRIBUTING.md.
Support¶
If you encounter any problems or have feature requests, please open an issue on GitHub.
License¶
MIT License - see the LICENSE file for details.
Documentation Links
For full documentation, check out these additional resources: - API Reference - Detailed documentation for Python API - Changelog - Version history and changes - Contributing Guide - How to contribute to the project - License - MIT License details