Metadata-Version: 2.4
Name: webc
Version: 0.1.2
Summary: Treat websites as programmable objects (Wikipedia-Locked Beta)
Author: Ashwin Prasanth
License: MIT License
        
        Copyright (c) 2026 WebC
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/ashtwin2win-Z/WebC
Project-URL: Bug Tracker, https://github.com/ashtwin2win-Z/WebC/issues
Keywords: web,scraper,automation,resource
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Operating System :: OS Independent
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.28.0
Requires-Dist: beautifulsoup4>=4.11.0
Dynamic: license-file

<h1 align="center"> WebC – Treat Websites as Python Objects</h1>

<p align="center">
<img src="https://github.com/ashtwin2win-Z/WebC/raw/main/assets/webc.png" alt="WebC Logo" width="280">
</p>

**Version:** 0.1.1
**Author:** Ashwin Prasanth

---

## Overview

`webc` is a Python library that allows you to treat websites as programmable Python objects.

Instead of manually handling HTTP requests, parsing HTML, and writing repetitive scraping logic, WebC provides a structured, object-oriented interface to access semantic content, query elements, and perform intent-driven tasks.

The goal is simple:

* Make web data feel native to Python
* Provide meaningful abstractions over raw HTML
* Encourage ethical and secure usage by default

---

## ⚠️ Developer Preview / Secure Beta

**WebC v0.1.1** is a developer preview release intended for testing and feedback.

This version prioritizes security, architecture stability, and controlled usage.

APIs may change during the beta phase.

---

## Installation

Install via pip:

```bash
pip install webc
```

### Dependencies

* requests
* beautifulsoup4

---

## Core Architecture

WebC is organized into four conceptual layers.

---

### 1. Resource Layer

Access a webpage as a `Resource` object:

```python
from webc import web

site = web["https://en.wikipedia.org/wiki/Python_(programming_language)"]
```

* Represents a single webpage
* Uses lazy loading (fetches HTML only when needed)
* Caches parsed content internally

---

### 2. Structure Layer

Provides semantic, high-level content extracted from the page:

```python
site.structure.title
site.structure.links
site.structure.images
site.structure.tables
```

#### Image Handling

* Extracts from `src`, `srcset`, `data-src`, and `<noscript>`
* Filters UI icons and SVG assets
* Resolves relative URLs automatically

Download images:

```python
site.structure.save_images(folder="python_images")
```

#### Table Extraction

* Detects Wikipedia `wikitable` tables
* Handles rowspan and colspan alignment
* Removes citation brackets (e.g., `[1]`)

Save tables as CSV:

```python
site.structure.save_tables(folder="wiki_data")
```

---

### 3. Query Layer

Provides direct DOM access via CSS selectors:

```python
headings = site.query["h1, h2"]

for h in headings:
    print(h.get_text(strip=True))
```

* Returns BeautifulSoup elements
* Useful for custom extraction logic
* Acts as an advanced access layer

---

### 4. Task Layer

Provides intent-driven actions:

```python
summary = site.task.summarize(max_chars=500)
print(summary)
```

Currently supported:

* `summarize(max_chars=500)`

More tasks will be introduced in future releases.

---

## Security & Usage Policy

This secure beta is intentionally restricted.

### Platform Restrictions

* Locked to **Wikipedia.org only**
* Only **HTTPS URLs** are allowed

### Built-in Protections

WebC includes safeguards against:

* SSRF attacks
* Path traversal
* Unsafe file writes
* Excessive downloads

Requests are controlled and content is cached to prevent unnecessary repeated fetching.

---

## Responsible Use

WebC is designed for:

✔ Educational purposes
✔ Research
✔ Personal automation
✔ Ethical data access

It must not be used for:

* Mass scraping
* Circumventing website policies
* Service disruption
* Data abuse

Users are responsible for complying with website Terms of Service.

---

## Full Usage Example

```python
from webc import web

url = "https://en.wikipedia.org/wiki/Python_(programming_language)"
site = web[url]

print("=== STRUCTURE ===")
print(f"Title: {site.structure.title}")
print(f"Total Links: {len(site.structure.links)}")
print(f"First 5 links: {site.structure.links[:5]}")

print("\n--- Downloading Resources ---")
site.structure.save_images(folder="python_images")
site.structure.save_tables(folder="python_data")

print("\n=== QUERY ===")
headings = site.query["h1, h2"]
print(f"Found {len(headings)} headings:")

for h in headings[:3]:
    print(f" - {h.get_text(strip=True)}")

print("\n=== TASK ===")
summary = site.task.summarize(max_chars=500)
print(summary)
```

---

## Roadmap

Planned future improvements:

* Multi-domain support
* Advanced rate limiting
* Enhanced security layers
* Plugin-based task extensions
* Dataset export helpers
* Cloud-safe scraping mode

---

## License

This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for the full license text.

© 2026 Ashwin Prasanth
