Metadata-Version: 2.4
Name: nsefast
Version: 0.1.0
Summary: Fast, robots.txt-respecting NSE India market data collector for swing trading, quant research, and backtesting
Author: Nikhil Shinde
License: MIT
Project-URL: Homepage, https://github.com/nikhilshinde/nsefast
Project-URL: Documentation, https://github.com/nikhilshinde/nsefast#readme
Project-URL: Repository, https://github.com/nikhilshinde/nsefast
Project-URL: Issues, https://github.com/nikhilshinde/nsefast/issues
Project-URL: Changelog, https://github.com/nikhilshinde/nsefast/blob/main/CHANGELOG.md
Keywords: nse,nse-india,stock-market,trading,quant,backtesting,bhavcopy,options,polars,duckdb
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Office/Business :: Financial :: Investment
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.28
Requires-Dist: urllib3>=1.26
Requires-Dist: beautifulsoup4>=4.11
Requires-Dist: lxml>=4.9
Requires-Dist: polars>=0.20
Requires-Dist: pyarrow>=12.0
Requires-Dist: duckdb>=0.9
Requires-Dist: python-dateutil>=2.8
Requires-Dist: typer>=0.9
Requires-Dist: rich>=13.0
Provides-Extra: pandas
Requires-Dist: pandas>=2.0; extra == "pandas"
Provides-Extra: postgres
Requires-Dist: psycopg2-binary>=2.9; extra == "postgres"
Requires-Dist: sqlalchemy>=2.0; extra == "postgres"
Provides-Extra: api
Requires-Dist: fastapi>=0.100; extra == "api"
Requires-Dist: uvicorn>=0.23; extra == "api"
Provides-Extra: dev
Requires-Dist: pytest>=7.4; extra == "dev"
Requires-Dist: pytest-cov>=4.1; extra == "dev"
Requires-Dist: ruff>=0.1; extra == "dev"
Requires-Dist: build>=1.0; extra == "dev"
Requires-Dist: twine>=4.0; extra == "dev"
Dynamic: license-file

# nsefast

Fast NSE India data collector for **swing trading**, **quant research**, **AI training**, **backtesting**, and **market intelligence**.

> ⚠️ **Ethics & Compliance:** `nsefast` only uses publicly downloadable NSE reports
> and pages allowed by NSE's `robots.txt`. It does **not** bypass logins, captchas,
> Cloudflare, anti-bot systems, or rate limits. Add appropriate delays and use
> responsibly. You are responsible for complying with NSE's terms of service.

## Features

- Polite, retrying HTTP client with `robots.txt` checks
- Modular collectors for **equity**, **derivatives**, **corporate**, **deals**,
  **indices**, **surveillance**, **calendar**, and **master** data
- [Polars](https://pola.rs) for fast dataframe processing
- [Parquet](https://parquet.apache.org/) primary storage, partitioned by dataset/date
- [DuckDB](https://duckdb.org) local analytics layer
- Optional PostgreSQL storage
- Optional Rust core (`rust-core/`) for hashing / dedup / large parsing
- Typer-based CLI

## Install

```bash
pip install nsefast
```

Optional extras:

```bash
pip install "nsefast[pandas]"      # pandas export helpers
pip install "nsefast[postgres]"    # PostgreSQL sink
pip install "nsefast[api]"         # FastAPI server scaffold
pip install "nsefast[dev]"         # pytest, ruff, build, twine
```

For development:

```bash
git clone https://github.com/nikhilshinde/nsefast
cd nsefast
pip install -e ".[dev]"
pytest -q
```

## Quick start

```bash
# Discover all downloadable report links from NSE public pages
nsefast collect-reports

# Run the full scaffold
nsefast collect-all

# Equity bhavcopy for a date
nsefast collect equity-bhavcopy --date 2026-05-07

# Corporate announcements range
nsefast collect corporate-announcements --start 2026-05-01 --end 2026-05-07

# Build swing-trading features
nsefast features swing --date 2026-05-07

# Export a dataset to Parquet
nsefast export parquet --dataset daily_bhavcopy
```

In Python:

```python
from nsefast.collectors.report_links import collect_report_links
from nsefast.storage.parquet_store import save_parquet

df = collect_report_links()  # polars DataFrame
save_parquet(df, dataset="report_links")
```

## Project layout

```text
nsefast/
├── pyproject.toml
├── requirements.txt
├── main.py
├── README.md
│
├── nsefast/
│   ├── config.py          # URLs, headers, paths
│   ├── http_client.py     # session + retries
│   ├── robots.py          # robots.txt checker
│   ├── collectors/        # one module per data domain
│   ├── processing/        # normalize, features, technicals
│   ├── storage/           # parquet, duckdb, postgres
│   └── cli.py             # Typer CLI
│
└── rust-core/             # optional pyo3 module
    ├── Cargo.toml
    └── src/lib.rs
```

## Storage zones

- `data/raw/`     — raw downloads exactly as fetched
- `data/clean/`   — normalized intermediate files
- `data/parquet/` — partitioned Parquet, the canonical store

## Rust core (optional)

The `rust-core/` crate exposes a `nsefast_core` Python module via
[PyO3](https://pyo3.rs/) for CPU-bound work (SHA-256 hashing, dedup,
fast CSV normalization). HTTP scraping stays in Python — it's I/O bound.

Build with [maturin](https://www.maturin.rs/):

```bash
cd rust-core
maturin develop --release
```

## Documentation

- [`docs/USAGE.md`](docs/USAGE.md) — full Python + CLI usage, canonical schemas, polite-use rules
- [`docs/PUBLISHING.md`](docs/PUBLISHING.md) — how to release new versions to PyPI
- [`CHANGELOG.md`](CHANGELOG.md) — version history

## Failure semantics

Every public collector returns a Polars DataFrame with its canonical
schema on **any** failure (invalid input, network error, malformed
payload, polars error, robots block). Collectors **never raise** — your
pipelines stay crash-proof.

## Tests

```bash
pytest -q     # 77 unit tests, no network calls
```

## License

MIT — see [`LICENSE`](LICENSE)
