Metadata-Version: 2.4
Name: ndtv-profit-scraper-safe
Version: 0.1.0
Summary: Safe, robots.txt-respecting scraper for public NDTV Profit news data — for research and NLP sentiment training.
Author: ndtv-profit-scraper-safe contributors
License: MIT
Project-URL: Homepage, https://pypi.org/project/ndtv-profit-scraper-safe/
Keywords: scraper,news,ndtv,ndtv-profit,indian-markets,sentiment,nlp,finance
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Classifier: Topic :: Office/Business :: Financial
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.31
Requires-Dist: beautifulsoup4>=4.12
Requires-Dist: pandas>=2.0
Requires-Dist: python-dateutil>=2.8
Requires-Dist: lxml>=4.9
Requires-Dist: urllib3>=2.0
Dynamic: license-file

# ndtv_profit_scraper_safe

A safe, robots.txt-respecting scraper library for collecting public NDTV Profit / NDTV business-news data for Indian market sentiment, swing trading research, and NLP training.

## Rules

- Respects `robots.txt`.
- Does NOT bypass login, paywall, captcha, Cloudflare, anti-bot systems, or rate limits.
- Adds delay, retry, timeout, and logging.
- Saves raw and clean data; returns pandas DataFrames.
- Intended for research and NLP sentiment training only.

## Data Categories

Latest news, Markets, Stocks, Business, Economy, Companies, IPO, Personal Finance, Mutual Funds, Commodities, Currency, Videos metadata, Market analysis, Expert views.

## Folder Structure

```
ndtv_profit_scraper_safe/
├── requirements.txt
├── main.py
├── README.md
├── ndtv_profit_scraper/
│   ├── __init__.py
│   ├── config.py
│   ├── http_client.py
│   ├── robots_checker.py
│   ├── url_collector.py
│   ├── html_collector.py
│   ├── parser.py
│   ├── sentiment.py
│   └── storage.py
└── data/
    ├── raw/
    └── clean/
```

## Run

```bash
pip install -r requirements.txt
python main.py
```

## Next Improvements

1. Add sitemap collector
2. Add RSS collector if feed endpoints are confirmed
3. Add stock-symbol mapping using NSE master list
4. Add impact score: Low / Medium / High
5. Add SQLite upsert to avoid duplicates
6. Add FastAPI endpoints
7. Add scheduler
8. Add PostgreSQL storage
9. Combine with Moneycontrol, ET, LiveMint, CNBC-TV18, Business Standard
