Metadata-Version: 2.4
Name: nitrowebfetch-cli
Version: 0.1.0
Summary: The developer‑friendly web content extractor with CSS selectors.
Home-page: https://github.com/Frodigo/garage/tree/main/Projects/Nitrowebfetch
Author: Marcin Kwiatkowski
Author-email: marcin@frodigo.com
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: playwright>=1.55.0
Requires-Dist: html2text>=2025.4.15

# NitroWebfetch

Extract web content, cleanly.

**NitroWebfetch – the developer‑friendly web content extractor with CSS selectors.**

This project is in alpha phase.

## Features

- Extracts content from web pages using CSS selectors
- Converts HTML to clean Markdown format
- Fallback selectors for maximum compatibility
- Command-line interface with various options
- Built on Playwright for reliable web scraping
- Completely free (open source, MIT license)

## Ideas for next steps

- Add support for multiple output formats (JSON, plain text)
- Batch processing for multiple URLs
- Custom user-agent and headers configuration
- Integration with NitroDigest for web page summarization
- Support for authentication and cookies
- Content filtering and cleaning options

---

## Usage

### Prerequisites

To run this tool, you need to have [Python](https://www.python.org/downloads/) installed on your local machine.

### Installation

Install NitroWebfetch via pip:

```bash
pip install nitrowebfetch-cli
playwright install firefox
```

For development installation:

```bash
cd Projects/Nitrowebfetch
pip install -e .
playwright install firefox
```

### Basic Usage

Run NitroWebfetch to extract content from web pages:

```bash
nitrowebfetch <url> > <output_file>
```

#### Examples

Extract article content from a webpage and save it to a file:

```bash
nitrowebfetch https://example.com/article > article.md
```

Extract content using a custom CSS selector:

```bash
nitrowebfetch https://example.com --selector ".main-content" > content.md
```

Get HTML output instead of Markdown:

```bash
nitrowebfetch https://example.com --format html > content.html
```

### Command Line Arguments

You can customize the extraction process using command line arguments:

```bash
nitrowebfetch \
    --selector ".article-body" \
    --format md \
    https://example.com
```

Available arguments:

- `url`: URL to fetch content from (required)
- `--selector`: CSS selector to use for content extraction (default: article)
- `--format`: Format of output content - 'md' for Markdown or 'html' for raw HTML (default: md)

### Fallback Selectors

If the primary selector doesn't match any elements, NitroWebfetch automatically tries these alternatives:

- `article`
- `main`
- `.article`
- `.content`
- `#content`
- `.post`
- `.entry-content`

---

## Contributing

Do you want to contribute to this tool? Check the Contributing page:

[Getting started](../../Contributing.md)

## Report an issue

Found an issue? You can easily report it here:

[https://github.com/Frodigo/garage/issues/new](https://github.com/Frodigo/garage/issues/new)

## License

This project is licensed under the MIT License - see the LICENSE file for details.
