Metadata-Version: 2.4
Name: ezwebb
Version: 0.1.0
Summary: A beginner-friendly Python library for simple, chainable web scraping.
Author-email: cibenk <benkkporever@gmail.com>
License: MIT License
        
        Copyright (c) 2026 ezweb contributors
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/yourusername/ezweb
Project-URL: Repository, https://github.com/yourusername/ezweb
Project-URL: Issues, https://github.com/yourusername/ezweb/issues
Keywords: web scraping,scraper,html,requests,beautifulsoup,automation
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Markup :: HTML
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.28
Requires-Dist: beautifulsoup4>=4.11
Requires-Dist: lxml>=4.9
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-mock>=3.10; extra == "dev"
Dynamic: license-file

# ezweb

**ezweb** adalah library Python yang membuat *web scraping* jadi sangat sederhana. Kamu tidak perlu memahami `requests`, `BeautifulSoup`, atau parsing HTML — cukup panggil method yang kamu butuhkan.

Cocok untuk:

- 🐣 Pemula Python
- 🤖 Programmer automation
- 💬 Developer bot
- 📊 Data scraper

```python
from ezweb import Website

web = Website("https://example.com")

print(web.title())
print(web.text())
print(web.links())
print(web.images())
```

## Instalasi

```bash
pip install ezweb
```

Requirement: Python 3.8+. Dependency (`requests`, `beautifulsoup4`, `lxml`) otomatis terpasang.

## Penggunaan Dasar

```python
from ezweb import Website

web = Website("https://example.com")

web.title()      # "Example Domain"
web.text()       # teks bersih dari halaman
web.html()       # HTML mentah/hasil clean()
web.links()      # semua link absolut
web.images()     # semua URL gambar absolut
web.headings()   # {'h1': [...], 'h2': [...], ...}
web.metadata()   # description, og:*, canonical, language, dll
```

## Method Chaining

Gunakan `Web` (alias dari `Website`) untuk gaya penulisan yang lebih ringkas:

```python
from ezweb import Web

data = (
    Web("https://example.com")
        .clean()
        .articles()
        .export("hasil.json")
)
```

`clean()`, `articles()` bisa dirangkai karena mengembalikan objek itu sendiri (`self`). `export()` mengembalikan path file yang disimpan.

## Semua Method

| Method | Deskripsi | Return |
|---|---|---|
| `Website(url)` | Membuat objek & langsung fetch halaman | `Website` |
| `.title()` | Judul halaman (`<title>` atau `<h1>` fallback) | `str` |
| `.text()` | Teks bersih tanpa tag HTML | `str` |
| `.html()` | HTML saat ini (mengikuti `clean()`) | `str` |
| `.links()` | Semua hyperlink (absolut, unik) | `List[str]` |
| `.images()` | Semua URL gambar (absolut, unik) | `List[str]` |
| `.headings()` | Heading `h1`–`h6` terkelompok | `Dict[str, List[str]]` |
| `.metadata()` | Meta description, Open Graph, canonical, dll | `Dict[str, str]` |
| `.clean()` | Hapus script/style/nav/footer/iklan/popup/cookie-banner | `Website` (chainable) |
| `.download_images(folder)` | Unduh semua gambar ke folder lokal | `List[str]` (path lokal) |
| `.export(path)` | Simpan hasil ke `.json` / `.txt` / `.html` | `str` (path) |
| `.articles()` *(basic)* | Ekstrak blok artikel (`<article>` atau kumpulan `<p>`) | `Website` (chainable) |
| `.tables()` *(basic)* | Ekstrak tabel HTML | `List[List[List[str]]]` |
| `.forms()` *(basic)* | Ekstrak struktur form | `List[Dict]` |
| `.videos()` *(basic)* | Ekstrak URL video/embed | `List[str]` |

### `clean()`

Menghapus elemen yang biasanya mengganggu:

- `script`, `style`, `iframe`, `nav`, `footer`, `header`, `aside`, `form`
- Popup umum & cookie banner umum
- Elemen dengan class/id seperti `ad`, `ads`, `advert`, `advertisement`, `banner`, `sponsor`

### `export()`

Format ditentukan otomatis dari ekstensi file:

- `.json` → seluruh data (title, text, html, links, images, metadata, headings, articles)
- `.txt` → hanya teks bersih
- `.html` → hanya HTML

## Contoh Output

```python
>>> web = Website("https://example.com")
>>> web.title()
'Example Domain'

>>> web.links()
['https://example.com/about', 'https://www.iana.org/domains/example']

>>> web.metadata()
{'description': 'Example Domain page', 'language': 'en'}
```

## Error Handling

ezweb menyediakan exception khusus di `ezweb.exceptions`:

```python
from ezweb import InvalidURLException, RequestFailedException, ParseException

try:
    web = Website("bukan-url-valid")
except InvalidURLException as e:
    print("URL tidak valid:", e)
```

- `InvalidURLException` — URL tidak valid/malformed
- `RequestFailedException` — request gagal (timeout, DNS error, status 4xx/5xx)
- `ParseException` — HTML gagal diparse

## Menjalankan Test

```bash
pip install -e ".[dev]"
pytest
```

## Roadmap

**v0.1.0** *(saat ini)*
- `title()`, `text()`, `html()`, `links()`, `images()`, `metadata()`, `headings()`
- `clean()`, `download_images()`, `export()`
- Dukungan dasar `articles()`, `tables()`, `forms()`, `videos()`

**v0.2.0** *(rencana)*
- `articles()` lebih pintar (deteksi konten utama ala readability)
- `tables()` → export langsung ke CSV/DataFrame
- `forms()` dengan validasi field
- `videos()` untuk lebih banyak platform embed
- `download()` generik (bukan hanya gambar)
- `api()` — mode scraping untuk endpoint JSON
- `cache()` — cache hasil fetch agar tidak request berulang
- `session()`, `cookies()`, `headers()` — kontrol request tingkat lanjut

## Kontribusi

Pull request sangat diterima! Struktur project ini modular (`parser.py`, `cleaner.py`, `extract.py`, `downloader.py`, `exporter.py`) sehingga mudah menambah fitur baru tanpa mengubah API publik `Website`/`Web`.

## Lisensi

MIT License — lihat file [LICENSE](LICENSE).
