Metadata-Version: 2.4
Name: page-classifier
Version: 0.1.0
Summary: Async webpage type classifier using URL, meta, structural, and content signals
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: beautifulsoup4>=4.12
Requires-Dist: httpx>=0.27
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=1.0; extra == "dev"

# Page Classifier

An asynchronous webpage type classifier that uses URL, metadata, structural, and content signals to accurately determine the type of a given web page (e.g., product page, blog post, contact-us, etc.).

## Installation

You can install the package directly from PyPI (once published):

```bash
pip install page-classifier
```

For development, clone the repository and install it using the `Makefile` or `pip`:

```bash
git clone <your-repository-url>
cd page-classifier
make install
# or for dev dependencies
make dev
```

## Usage

### Python API

You can use the classifier programmatically in your asynchronous Python applications:

```python
import asyncio
from page_classifier import PageClassifier, ClassifierConfig

async def main():
    # Initialize the classifier
    classifier = PageClassifier(config=ClassifierConfig())
    
    # Classify a URL
    result = await classifier.classify_url(
        url="https://www.ganpatihandicrafts.com/printed-kurti.html"
    )
    
    # View the results
    print(result.to_dict())

if __name__ == "__main__":
    asyncio.run(main())
```

### Command-Line Interface (CLI)

The package provides a built-in CLI to easily classify pages from your terminal:

```bash
python -m page_classifier "https://www.ganpatihandicrafts.com/printed-kurti.html"
```

#### CLI Options

- `url`: The URL to fetch and classify.
- `--platform NAME`: Force a specific platform instead of auto-detecting. Only that platform's routing rules will fire.
- `--timeout SECONDS`: HTTP fetch timeout (default: 15.0).
- `--json`: Print the full result as JSON instead of a summary.
- `--list-platforms`: List all supported platform names and exit.

**Example with JSON output and timeout:**
```bash
python -m page_classifier "https://example.com/product/123" --timeout 10 --json
```

## Development

A `Makefile` is included to streamline development tasks:
- `make install`: Install the project.
- `make dev`: Install the project with development dependencies.
- `make test`: Run the pytest suite.
- `make build`: Build the distribution packages (`sdist` and `wheel`).
- `make publish`: Build and publish the package to PyPI using twine.
- `make clean`: Clean up build artifacts and cache directories.
