Metadata-Version: 2.4
Name: mdfetch
Version: 0.5.0
Summary: Extract article content from web platforms and return it as clean Markdown.
Project-URL: Homepage, https://github.com/stn1slv/md-fetch
Project-URL: Source, https://github.com/stn1slv/md-fetch
Project-URL: Issues, https://github.com/stn1slv/md-fetch/issues
Author-email: Stanislav Deviatov <devyatov@gmail.com>
License: MIT
License-File: LICENSE
Keywords: article,dev.to,extraction,markdown,medium,scraping,substack
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Text Processing :: Markup :: Markdown
Requires-Python: >=3.12
Requires-Dist: beautifulsoup4>=4.12
Requires-Dist: click>=8.1
Requires-Dist: httpx>=0.27
Requires-Dist: lxml>=5.0
Requires-Dist: markdownify>=0.13
Provides-Extra: dev
Requires-Dist: mypy>=1.9; extra == 'dev'
Requires-Dist: pytest-mock>=3.14.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Description-Content-Type: text/markdown

# mdfetch

A Python library that extracts article content from web platforms and returns it as clean Markdown.

## Install

```bash
pip install mdfetch
```

## CLI Usage

You can use the built-in `md-fetch` command directly from your terminal:

```bash
# Fetch and print Markdown to standard output
md-fetch https://medium.com/example/article

# Fetch and save Markdown to a file
md-fetch https://dev.to/example/article --output article.md
```

## Python Usage

```python
from mdfetch import extract

# Works with any supported platform — just pass the URL
markdown = extract("https://medium.com/some-publication/article-slug-abc123")
markdown = extract("https://dev.to/username/article-slug")
markdown = extract("https://example.substack.com/p/article-slug")
markdown = extract("https://thenewstack.io/article-slug")
markdown = extract("https://dzone.com/articles/article-slug")
print(markdown)
```

## Error handling

```python
from mdfetch import (
    extract,
    InvalidURLError,
    UnsupportedPlatformError,
    UnsupportedContentTypeError,
    FetchError,
    HTTPStatusError,
    EmptyContentError,
)

url = "https://medium.com/some-publication/article-slug-abc123"

try:
    markdown = extract(url)
except InvalidURLError as e:
    print(f"Bad URL: {e.message}")
except UnsupportedPlatformError as e:
    print(f"Platform not supported: {e.message}")
except UnsupportedContentTypeError as e:
    print(f"Not an article page: {e.message}")
except HTTPStatusError as e:
    print(f"HTTP {e.status_code}: {e.message}")
except FetchError as e:
    print(f"Network error: {e.message}")
except EmptyContentError as e:
    print(f"No content: {e.message}")
```

## Supported platforms

| Platform | Domains |
|----------|---------|
| Medium   | `medium.com`, `*.medium.com` |
| dev.to   | `dev.to` |
| Substack | `substack.com`, `*.substack.com` |
| The New Stack | `thenewstack.io` |
| DZone | `dzone.com` |

## Development

Requires [uv](https://docs.astral.sh/uv/).

```bash
make setup        # install dependencies
make test         # run unit tests
make integration  # run integration tests (requires network access)
make lint         # ruff check
make format       # ruff format
make build        # build wheel + sdist
make upgrade-deps # upgrade all dependencies
make clean        # remove build artifacts
```

## Requirements

- Python 3.12+
