Metadata-Version: 2.4
Name: logo-hunter
Version: 0.1.0
Summary: A package to fetch and process logos from customer websites.
Author-email: Your Name <your.email@example.com>
License: MIT
Project-URL: Homepage, https://example.com/logo-hunter
Project-URL: Repository, https://github.com/yourusername/logo-hunter
Keywords: logo,web scraping,image processing
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: httpx>=0.24.1
Requires-Dist: selectolax>=0.3.29
Requires-Dist: Pillow>=9.5.0
Dynamic: license-file

# LogoHunter 🎯

A modern, async Python library for fetching and processing customer logos from websites.

## Features

- **Async-first design** using `httpx` for high-performance HTTP requests
- **Fast HTML parsing** with `selectolax` (much faster than BeautifulSoup)
- **Concurrent logo fetching** for improved performance
- **Multiple logo sources**: favicons, Open Graph images, Apple touch icons, etc.
- **Smart logo selection** based on image size and quality
- **Image processing** with resizing and format conversion
- **Backward compatibility** with synchronous wrapper

## Installation

```bash
pip install logo-hunter
```

Or install from source:

```bash
git clone <repository-url>
cd logohunter
pip install -e .
```

## Quick Start

### Async Usage (Recommended)

```python
import asyncio
from logohunter.hunter import LogoHunter

async def main():
    # Get logo as PNG bytes
    logo_bytes = await LogoHunter.get_customer_logo(
        "github.com",
        output_format="PNG",
        resize_to=(128, 128)
    )
    
    if logo_bytes:
        with open("github_logo.png", "wb") as f:
            f.write(logo_bytes)
        print("Logo saved!")

asyncio.run(main())
```

### Synchronous Usage

```python
from logohunter.hunter import SyncLogoHunter

# Get logo as JPEG bytes  
logo_bytes = SyncLogoHunter.get_customer_logo(
    "stackoverflow.com",
    output_format="JPEG",
    resize_to=(64, 64)
)

if logo_bytes:
    with open("stackoverflow_logo.jpg", "wb") as f:
        f.write(logo_bytes)
    print("Logo saved!")
```

## Advanced Usage

### Step-by-Step Processing

```python
import asyncio
from logohunter.hunter import LogoHunter

async def detailed_example():
    domain = "python.org"
    
    # 1. Find all logo URLs
    logo_urls = await LogoHunter.find_logo_urls(domain)
    print(f"Found {len(logo_urls)} logo URLs")
    
    # 2. Fetch and select the best logo
    best_logo = await LogoHunter.fetch_best_logo(logo_urls)
    
    if best_logo:
        print(f"Best logo size: {best_logo.size}")
        
        # 3. Process the image
        logo_bytes = LogoHunter.process_image(
            best_logo,
            output_format="PNG",
            resize_to=(256, 256)
        )
        
        with open(f"{domain}_logo.png", "wb") as f:
            f.write(logo_bytes)

asyncio.run(detailed_example())
```

### Batch Processing

```python
import asyncio
from logohunter.hunter import LogoHunter

async def batch_example():
    domains = ["microsoft.com", "google.com", "apple.com"]
    
    # Process all domains concurrently
    tasks = [
        LogoHunter.get_customer_logo(domain, resize_to=(100, 100))
        for domain in domains
    ]
    
    results = await asyncio.gather(*tasks, return_exceptions=True)
    
    for domain, result in zip(domains, results):
        if isinstance(result, Exception):
            print(f"Error processing {domain}: {result}")
        elif result:
            with open(f"{domain}_logo.png", "wb") as f:
                f.write(result)
            print(f"Saved logo for {domain}")

asyncio.run(batch_example())
```

## API Reference

### LogoHunter (Async)

#### `await LogoHunter.get_customer_logo(domain, output_format="PNG", resize_to=None)`

Main method to get a processed logo.

- **domain**: Domain name (e.g., "github.com")
- **output_format**: "PNG", "JPEG", "WEBP", etc.
- **resize_to**: Tuple (width, height) for resizing
- **Returns**: Logo bytes or None

#### `await LogoHunter.find_logo_urls(domain)`

Find all available logo URLs for a domain.

- **Returns**: List of logo URLs

#### `await LogoHunter.fetch_best_logo(logo_urls)`

Fetch and select the highest quality logo.

- **logo_urls**: List of URLs to fetch
- **Returns**: PIL Image object or None

#### `LogoHunter.process_image(image, output_format="PNG", resize_to=None)`

Process a PIL Image (resize, convert format).

- **image**: PIL Image object
- **output_format**: Target format
- **resize_to**: Tuple (width, height) for resizing
- **Returns**: Processed image bytes

### SyncLogoHunter (Backward Compatibility)

Same methods as `LogoHunter` but synchronous (no `await` needed).

## Logo Sources

The library searches for logos in the following order:

1. **Default favicon** (`/favicon.ico`)
2. **HTML meta tags**:
   - `<meta property="og:image">`
   - `<meta name="og:image">`
   - `<meta name="msapplication-TileImage">`
3. **HTML link tags**:
   - `<link rel="icon">`
   - `<link rel="apple-touch-icon">`
   - `<link rel="apple-touch-icon-precomposed">`
   - `<link rel="shortcut icon">`

## Performance Features

- **Concurrent processing** with configurable limits
- **Connection pooling** via httpx
- **Fast HTML parsing** with selectolax (5-25x faster than BeautifulSoup)
- **Smart caching** and deduplication
- **Timeout handling** for reliable operation

## Error Handling

The library is designed to be resilient:

- Network errors are logged but don't crash the application
- Invalid images are skipped automatically
- Missing logos return `None` rather than raising exceptions
- Malformed HTML is handled gracefully

## Migration from v1.x

If you're upgrading from the old version:

### Old Code (v1.x)
```python
import requests
from bs4 import BeautifulSoup
import favicon

logos = favicon.get('https://example.com')
response = requests.get(logos[0].url)
```

### New Code (v2.x)
```python
# Async (recommended)
logo_bytes = await LogoHunter.get_customer_logo("example.com")

# Or sync (for backward compatibility)
logo_bytes = SyncLogoHunter.get_customer_logo("example.com")
```

## Requirements

- Python 3.8+
- httpx >= 0.24.1
- selectolax >= 0.3.29
- Pillow >= 9.5.0

## License

MIT License - see LICENSE file for details.

## Contributing

Contributions welcome! Please read CONTRIBUTING.md for guidelines.

## Changelog

### v2.0.0
- **Breaking**: Async-first API design
- **New**: httpx replaces requests for better async support
- **New**: selectolax replaces BeautifulSoup for 5-25x faster parsing
- **New**: Concurrent logo fetching
- **New**: Better error handling and logging
- **New**: Type hints throughout
- **Added**: Backward compatibility via SyncLogoHunter
