Metadata-Version: 2.4
Name: et-scraper-safe
Version: 0.1.0
Summary: A polite, RSS-first Economic Times news and sentiment collector for market research.
Author: et_scraper_safe contributors
License: MIT
Project-URL: Homepage, https://github.com/yourusername/et-scraper-safe
Project-URL: Issues, https://github.com/yourusername/et-scraper-safe/issues
Keywords: economic-times,news,sentiment,scraper,rss,finance,swing-trading
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Office/Business :: Financial :: Investment
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.31
Requires-Dist: beautifulsoup4>=4.12
Requires-Dist: pandas>=2.0
Requires-Dist: feedparser>=6.0
Requires-Dist: python-dateutil>=2.8
Requires-Dist: lxml>=4.9
Dynamic: license-file

# et_scraper_safe

A safe, polite Python library for collecting public Economic Times data
(news headlines and summaries) for market and sentiment research.

## Safety rules

- Respects `robots.txt` before fetching any HTML page.
- Does **not** bypass logins, paywalls, captchas, Cloudflare, or rate limits.
- Prefers RSS feeds over HTML scraping.
- Adds a delay between HTML requests.
- Saves both raw and cleaned data as CSV.
- Returns pandas DataFrames so it plugs directly into analysis pipelines.

## Folder structure

```
et_scraper_safe/
├── requirements.txt
├── main.py
├── README.md
├── et_scraper/
│   ├── __init__.py
│   ├── config.py
│   ├── robots_checker.py
│   ├── rss_collector.py
│   ├── html_collector.py
│   ├── parser.py
│   ├── sentiment.py
│   └── storage.py
└── data/
    ├── raw/
    └── clean/
```

## Install & run

```bash
cd et_scraper_safe
pip install -r requirements.txt
python main.py
```

## Output example

```
source,category,title,summary,link,published,sentiment_score,sentiment_label
economic_times,stocks,Tata Motors shares rally...,...,link,...,2,Bullish
economic_times,economy,Rupee falls against dollar...,...,link,...,-1,Bearish
```

## Use in a swing trading pipeline

```
Economic Times News
        ↓
Headline Sentiment
        ↓
Stock Symbol Mapping
        ↓
Technical Indicators
        ↓
Final Swing Score
```

## Library usage

```python
from et_scraper import fetch_all_rss_news, sentiment_score, sentiment_label

df = fetch_all_rss_news()
df["sentiment_score"] = df["title"].apply(sentiment_score)
df["sentiment_label"] = df["sentiment_score"].apply(sentiment_label)
```

## Categories collected

Latest, Markets, Stocks, Economy, Business, IPO, Mutual Funds,
Commodities, Forex.
