Metadata-Version: 2.4
Name: nsefast
Version: 0.2.3
Summary: Fast, robots.txt-respecting NSE India market data collector for swing trading, quant research, and backtesting
Author: Nikhil Shinde
License: MIT
Project-URL: Homepage, https://github.com/nikhilshinde/nsefast
Project-URL: Documentation, https://github.com/nikhilshinde/nsefast#readme
Project-URL: Repository, https://github.com/nikhilshinde/nsefast
Project-URL: Issues, https://github.com/nikhilshinde/nsefast/issues
Project-URL: Changelog, https://github.com/nikhilshinde/nsefast/blob/main/CHANGELOG.md
Keywords: nse,nse-india,stock-market,trading,quant,backtesting,bhavcopy,options,polars,duckdb
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Office/Business :: Financial :: Investment
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.28
Requires-Dist: urllib3>=1.26
Requires-Dist: beautifulsoup4>=4.11
Requires-Dist: lxml>=4.9
Requires-Dist: polars>=0.20
Requires-Dist: pyarrow>=12.0
Requires-Dist: duckdb>=0.9
Requires-Dist: python-dateutil>=2.8
Requires-Dist: typer>=0.9
Requires-Dist: rich>=13.0
Provides-Extra: pandas
Requires-Dist: pandas>=2.0; extra == "pandas"
Provides-Extra: postgres
Requires-Dist: psycopg2-binary>=2.9; extra == "postgres"
Requires-Dist: sqlalchemy>=2.0; extra == "postgres"
Provides-Extra: api
Requires-Dist: fastapi>=0.100; extra == "api"
Requires-Dist: uvicorn>=0.23; extra == "api"
Provides-Extra: dev
Requires-Dist: pytest>=7.4; extra == "dev"
Requires-Dist: pytest-cov>=4.1; extra == "dev"
Requires-Dist: ruff>=0.1; extra == "dev"
Requires-Dist: build>=1.0; extra == "dev"
Requires-Dist: twine>=4.0; extra == "dev"
Dynamic: license-file

# nsefast

Fast NSE India data collector for **swing trading**, **quant research**, **AI training**, **backtesting**, and **market intelligence**.

> ⚠️ **Ethics & Compliance:** `nsefast` only uses publicly downloadable NSE reports
> and pages allowed by NSE's `robots.txt`. It does **not** bypass logins, captchas,
> Cloudflare, anti-bot systems, or rate limits. Add appropriate delays and use
> responsibly. You are responsible for complying with NSE's terms of service.

## Features

- Polite, retrying HTTP client with `robots.txt` checks
- Modular collectors for **equity**, **derivatives**, **corporate**, **deals**,
  **indices**, **surveillance**, **calendar**, and **master** data
- [Polars](https://pola.rs) for fast dataframe processing
- [Parquet](https://parquet.apache.org/) primary storage, partitioned by dataset/date
- [DuckDB](https://duckdb.org) local analytics layer
- Optional PostgreSQL storage
- Optional Rust core (`rust-core/`) for hashing / dedup / large parsing
- Typer-based CLI

## Install

```bash
pip install nsefast
```

Optional extras:

```bash
pip install "nsefast[pandas]"      # pandas export helpers
pip install "nsefast[postgres]"    # PostgreSQL sink
pip install "nsefast[api]"         # FastAPI server scaffold
pip install "nsefast[dev]"         # pytest, ruff, build, twine
```

For development:

```bash
git clone https://github.com/nikhilshinde/nsefast
cd nsefast
pip install -e ".[dev]"
pytest -q
```

## Quick start

```bash
# Discover all downloadable report links from NSE public pages
nsefast collect-reports

# Run the full scaffold
nsefast collect-all

# Equity bhavcopy for a date
nsefast collect equity-bhavcopy --date 2026-05-07

# Corporate announcements range
nsefast collect corporate-announcements --start 2026-05-01 --end 2026-05-07

# Build swing-trading features
nsefast features swing --date 2026-05-07

# Export a dataset to Parquet
nsefast export parquet --dataset daily_bhavcopy
```

In Python:

```python
from nsefast.collectors.report_links import collect_report_links
from nsefast.storage.parquet_store import save_parquet

df = collect_report_links()  # polars DataFrame
save_parquet(df, dataset="report_links")
```

## Project layout

```text
nsefast/
├── pyproject.toml
├── requirements.txt
├── main.py
├── README.md
│
├── nsefast/
│   ├── config.py          # URLs, headers, paths
│   ├── http_client.py     # session + retries
│   ├── robots.py          # robots.txt checker
│   ├── collectors/        # one module per data domain
│   ├── processing/        # normalize, features, technicals
│   ├── storage/           # parquet, duckdb, postgres
│   └── cli.py             # Typer CLI
│
└── rust-core/             # optional pyo3 module
    ├── Cargo.toml
    └── src/lib.rs
```

## Storage zones

- `data/raw/`     — raw downloads exactly as fetched
- `data/clean/`   — normalized intermediate files
- `data/parquet/` — partitioned Parquet, the canonical store

## Rust core (optional)

The `rust-core/` crate exposes a `nsefast_core` Python module via
[PyO3](https://pyo3.rs/) for CPU-bound work (SHA-256 hashing, dedup,
fast CSV normalization). HTTP scraping stays in Python — it's I/O bound.

Build with [maturin](https://www.maturin.rs/):

```bash
cd rust-core
maturin develop --release
```

## Verify your install

```bash
pip install nsefast
nsefast verify              # offline checks: imports, parquet, duckdb
nsefast verify --network    # also pings NSE warm-up + robots.txt
nsefast version
```

## Cache, logging, partitioning

```bash
# Cache (5-min TTL by default; collectors opt in via cached_get())
nsefast cache stats
nsefast cache clear

# Structured JSON logs (for production / log shippers)
NSEFAST_LOG_FORMAT=json NSEFAST_LOG_LEVEL=INFO nsefast collect bulk-deals --start 2026-04-01 --end 2026-05-07
```

```python
# Hive-partitioned parquet writes
from nsefast.storage.parquet_store import (
    save_parquet_partitioned, read_parquet_partitioned, derive_date_partitions,
)
df = derive_date_partitions(df, "trade_date", parts=("year", "month"))
save_parquet_partitioned(df, dataset="daily_bhavcopy", by=["year", "month"])
# -> data/parquet/daily_bhavcopy/year=2026/month=05/*.parquet

q1 = read_parquet_partitioned("daily_bhavcopy",
                              filters=[("year","==",2026), ("month",">=",4)])

# DuckDB analytics
from nsefast.storage.duckdb_store import (
    connect, register_all, top_gainers, sector_leaderboard,
)
con = connect()
register_all(con)
top_gainers(con, dataset="all_indices", n=10)
sector_leaderboard(con, dataset="sector_strength")
```

## Swing-trading research (`nsefast.swing`)

```python
from nsefast.collectors.equity   import daily_bhavcopy, delivery_data
from nsefast.collectors.indices  import sector_strength
from nsefast.collectors.deals    import bulk_deals
from nsefast.collectors.corporate import corporate_announcements
from nsefast.processing.features import add_volume_breakout
from nsefast.swing import (
    top_upside, top_downside, avoid_list, sector_leaders,
    delivery_breakout, volume_breakout,
    bulk_block_watchlist, corporate_announcement_watchlist, combined_watchlist,
)

bhav = daily_bhavcopy("2026-05-07")
bhav = add_volume_breakout(bhav)            # adds avg_volume_20

# Long candidates (filtered, scored, ranked)
top_upside(bhav, n=20, min_turnover=1e7)

# Weakest names (short candidates)
top_downside(bhav, n=20)

# What to skip (surveillance + extreme-move list)
avoid_list(bhav, max_volatility_pct=15.0)

# Sector rotation
sector_leaders(sector_strength(), n=5)

# Sticky-money & spike scans
delivery_breakout(delivery_data("2026-05-07"), min_delivery_pct=70)
volume_breakout(bhav, min_ratio=2.0)

# Smart-money & event watchlists
bulk_block_watchlist(bulk_deals("2026-04-01", "2026-05-07"), min_qty=10_000)
corporate_announcement_watchlist(corporate_announcements("2026-04-01", "2026-05-07"))
combined_watchlist(deals_df=..., ann_df=...)
```

```python
# Position sizing & ATR stops
from nsefast.swing.risk import (
    position_size, add_atr, add_atr_stop, add_position_size,
)
qty = position_size(capital=500_000, entry=120, stop=115, risk_pct=1.0)
sized = (bhav.pipe(add_atr).pipe(add_atr_stop, mult=2.0)
         .pipe(add_position_size, capital=500_000, risk_pct=1.0))
```

```python
# Minimal walk-forward backtest (full engine + ML lands in v0.3.0)
from nsefast.swing.backtest import run_backtest, summary_stats
trades = run_backtest(history_df, signal_fn=lambda d: d["close"] > d["close"].shift(20),
                      holding_days=5)
summary_stats(trades)
```

## Documentation

- [`docs/USAGE.md`](docs/USAGE.md) — full Python + CLI usage, canonical schemas, polite-use rules
- [`docs/PUBLISHING.md`](docs/PUBLISHING.md) — how to release new versions to PyPI
- [`CHANGELOG.md`](CHANGELOG.md) — version history

## Failure semantics

Every public collector returns a Polars DataFrame with its canonical
schema on **any** failure (invalid input, network error, malformed
payload, polars error, robots block). Collectors **never raise** — your
pipelines stay crash-proof.

## Tests

```bash
pytest -q     # 77 unit tests, no network calls
```

## License

MIT — see [`LICENSE`](LICENSE)
