Metadata-Version: 2.4
Name: nsefast
Version: 0.2.15
Summary: Fast, robots.txt-respecting NSE India market data collector for swing trading, quant research, and backtesting
Author: Nikhil Shinde
License: MIT
Project-URL: Homepage, https://github.com/nikhilshinde/nsefast
Project-URL: Documentation, https://github.com/nikhilshinde/nsefast#readme
Project-URL: Repository, https://github.com/nikhilshinde/nsefast
Project-URL: Issues, https://github.com/nikhilshinde/nsefast/issues
Project-URL: Changelog, https://github.com/nikhilshinde/nsefast/blob/main/CHANGELOG.md
Keywords: nse,nse-india,stock-market,trading,quant,backtesting,bhavcopy,options,polars,duckdb
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Office/Business :: Financial :: Investment
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.28
Requires-Dist: urllib3>=1.26
Requires-Dist: beautifulsoup4>=4.11
Requires-Dist: lxml>=4.9
Requires-Dist: polars>=0.20
Requires-Dist: numpy>=1.23
Requires-Dist: pyarrow>=12.0
Requires-Dist: duckdb>=0.9
Requires-Dist: python-dateutil>=2.8
Requires-Dist: typer>=0.9
Requires-Dist: rich>=13.0
Provides-Extra: pandas
Requires-Dist: pandas>=2.0; extra == "pandas"
Provides-Extra: postgres
Requires-Dist: psycopg2-binary>=2.9; extra == "postgres"
Requires-Dist: sqlalchemy>=2.0; extra == "postgres"
Provides-Extra: api
Requires-Dist: fastapi>=0.100; extra == "api"
Requires-Dist: uvicorn>=0.23; extra == "api"
Provides-Extra: dev
Requires-Dist: pytest>=7.4; extra == "dev"
Requires-Dist: pytest-cov>=4.1; extra == "dev"
Requires-Dist: ruff>=0.1; extra == "dev"
Requires-Dist: build>=1.0; extra == "dev"
Requires-Dist: twine>=4.0; extra == "dev"
Requires-Dist: hypothesis>=6.80; extra == "dev"
Dynamic: license-file

# nsefast

Fast NSE India data collector for **swing trading**, **quant research**, **AI training**, **backtesting**, and **market intelligence**.

> ⚠️ **Ethics & Compliance:** `nsefast` only uses publicly downloadable NSE reports
> and pages allowed by NSE's `robots.txt`. It does **not** bypass logins, captchas,
> Cloudflare, anti-bot systems, or rate limits. Add appropriate delays and use
> responsibly. You are responsible for complying with NSE's terms of service.

## Features

- Polite, retrying HTTP client with `robots.txt` checks
- Modular collectors for **equity**, **derivatives**, **corporate**, **deals**,
  **indices**, **surveillance**, **calendar**, and **master** data
- **Smart-signals layer** (`processing.technicals` + `swing.scoring` /
  `relative_strength` / `breakouts`): full indicator pack
  (SMA / EMA / RSI / MACD / Bollinger / Donchian / SuperTrend / ADX / OBV),
  multi-timeframe Z-scores, sector-relative strength, 52-week and
  Donchian breakouts, VCP-style consolidation
- **Risk engine** (`swing.risk` + `swing.portfolio`): per-trade ATR / chandelier
  / trailing / swing-low stops, vol-targeted sizing, and a
  portfolio-level **water-fill allocator** with single-name, sector,
  and correlation-cluster caps
- [Polars](https://pola.rs) for fast dataframe processing
- [Parquet](https://parquet.apache.org/) primary storage, partitioned by dataset/date
- [DuckDB](https://duckdb.org) local analytics layer
- Optional PostgreSQL storage
- Optional Rust core (`rust-core/`) for hashing / dedup / large parsing
- Typer-based CLI

## Install

```bash
pip install nsefast
```

Optional extras:

```bash
pip install "nsefast[pandas]"      # pandas export helpers
pip install "nsefast[postgres]"    # PostgreSQL sink
pip install "nsefast[api]"         # FastAPI server scaffold
pip install "nsefast[dev]"         # pytest, ruff, build, twine
```

For development:

```bash
git clone https://github.com/nikhilshinde/nsefast
cd nsefast
pip install -e ".[dev]"
pytest -q
```

## Quick start

```bash
# Discover all downloadable report links from NSE public pages
nsefast collect-reports

# Run the full scaffold
nsefast collect-all

# Equity bhavcopy for a date
nsefast collect equity-bhavcopy --date 2026-05-07

# Corporate announcements range
nsefast collect corporate-announcements --start 2026-05-01 --end 2026-05-07

# Build swing-trading features
nsefast features swing --date 2026-05-07

# Export a dataset to Parquet
nsefast export parquet --dataset daily_bhavcopy
```

In Python:

```python
from nsefast.collectors.report_links import collect_report_links
from nsefast.storage.parquet_store import save_parquet

df = collect_report_links()  # polars DataFrame
save_parquet(df, dataset="report_links")
```

## Project layout

```text
nsefast/
├── pyproject.toml
├── requirements.txt
├── main.py
├── README.md
│
├── nsefast/
│   ├── config.py          # URLs, headers, paths
│   ├── http_client.py     # session + retries
│   ├── robots.py          # robots.txt checker
│   ├── collectors/        # one module per data domain
│   ├── processing/        # normalize, features, adjustments, technicals
│   ├── swing/             # filters, scoring, RS, breakouts, risk, portfolio,
│   │                      # scanner, watchlist, backtest
│   ├── master/            # symbol survivorship, index constituents
│   ├── storage/           # parquet, duckdb, postgres
│   └── cli.py             # Typer CLI
│
└── rust-core/             # optional pyo3 module
    ├── Cargo.toml
    └── src/lib.rs
```

## Storage zones

- `data/raw/`     — raw downloads exactly as fetched
- `data/clean/`   — normalized intermediate files
- `data/parquet/` — partitioned Parquet, the canonical store

## Rust core (optional)

The `rust-core/` crate exposes a `nsefast_core` Python module via
[PyO3](https://pyo3.rs/) for CPU-bound work (SHA-256 hashing, dedup,
fast CSV normalization). HTTP scraping stays in Python — it's I/O bound.

Build with [maturin](https://www.maturin.rs/):

```bash
cd rust-core
maturin develop --release
```

## Verify your install

```bash
pip install nsefast
nsefast verify              # offline checks: imports, parquet, duckdb
nsefast verify --network    # also pings NSE warm-up + robots.txt
nsefast version
```

## Cache, logging, partitioning

```bash
# Cache (5-min TTL by default; collectors opt in via cached_get())
nsefast cache stats
nsefast cache clear

# Structured JSON logs (for production / log shippers)
NSEFAST_LOG_FORMAT=json NSEFAST_LOG_LEVEL=INFO nsefast collect bulk-deals --start 2026-04-01 --end 2026-05-07
```

```python
# Hive-partitioned parquet writes
from nsefast.storage.parquet_store import (
    save_parquet_partitioned, read_parquet_partitioned, derive_date_partitions,
)
df = derive_date_partitions(df, "trade_date", parts=("year", "month"))
save_parquet_partitioned(df, dataset="daily_bhavcopy", by=["year", "month"])
# -> data/parquet/daily_bhavcopy/year=2026/month=05/*.parquet

q1 = read_parquet_partitioned("daily_bhavcopy",
                              filters=[("year","==",2026), ("month",">=",4)])

# DuckDB analytics
from nsefast.storage.duckdb_store import (
    connect, register_all, top_gainers, sector_leaderboard,
)
con = connect()
register_all(con)
top_gainers(con, dataset="all_indices", n=10)
sector_leaderboard(con, dataset="sector_strength")
```

## Swing-trading research (`nsefast.swing`)

```python
from nsefast.collectors.equity   import daily_bhavcopy, delivery_data
from nsefast.collectors.indices  import sector_strength
from nsefast.collectors.deals    import bulk_deals
from nsefast.collectors.corporate import corporate_announcements
from nsefast.processing.features import add_volume_breakout
from nsefast.swing import (
    top_upside, top_downside, avoid_list, sector_leaders,
    delivery_breakout, volume_breakout,
    bulk_block_watchlist, corporate_announcement_watchlist, combined_watchlist,
)

bhav = daily_bhavcopy("2026-05-07")
bhav = add_volume_breakout(bhav)            # adds avg_volume_20

# Long candidates (filtered, scored, ranked)
top_upside(bhav, n=20, min_turnover=1e7)

# Weakest names (short candidates)
top_downside(bhav, n=20)

# What to skip (surveillance + extreme-move list)
avoid_list(bhav, max_volatility_pct=15.0)

# Sector rotation
sector_leaders(sector_strength(), n=5)

# Sticky-money & spike scans
delivery_breakout(delivery_data("2026-05-07"), min_delivery_pct=70)
volume_breakout(bhav, min_ratio=2.0)

# Smart-money & event watchlists
bulk_block_watchlist(bulk_deals("2026-04-01", "2026-05-07"), min_qty=10_000)
corporate_announcement_watchlist(corporate_announcements("2026-04-01", "2026-05-07"))
combined_watchlist(deals_df=..., ann_df=...)
```

### Smart signals (technicals, Z-scores, RS, breakouts)

```python
from nsefast.processing.technicals import (
    add_sma, add_ema, add_rsi, add_macd, add_bollinger,
    add_donchian, add_supertrend, add_adx, add_obv,
    add_all_technicals,
)
from nsefast.swing import (
    add_multi_timeframe_zscores,
    add_relative_strength, add_rs_score,
    near_52w_high, donchian_breakout, consolidation_breakout,
    add_gap_pct, add_range_atr,
)

# Full indicator pack on a multi-symbol panel (per-symbol via .over("symbol"))
panel = add_all_technicals(history_df)

# 5/20/60-day momentum & volume Z-scores
panel = add_multi_timeframe_zscores(panel)        # mom_z_5/20/60, vol_z_5/20/60

# Sector-relative strength vs a benchmark (eg NIFTY)
panel = add_relative_strength(panel, benchmark_df, lookback=20)
panel = add_rs_score(panel, rs_col="rs_20")       # cross-sectional 0-100 percentile

# Breakout filters
near_52w_high(panel, within_pct=2.0)              # within 2% of 52w high
donchian_breakout(panel, n=20)                    # close > prior 20-bar high (today excluded)
consolidation_breakout(panel)                     # VCP-style range expansion

panel = add_gap_pct(panel)                        # uses prev_close if present
panel = add_range_atr(panel, period=14)           # > 1 = expansion day
```

### Per-trade sizing & stops (`swing.risk`)

```python
from nsefast.swing.risk import (
    position_size, add_atr, add_atr_stop, add_position_size,
    add_chandelier_stop, add_trailing_atr_stop, add_swing_low_stop,
    add_vol_target_size,
)

# Classic equal-rupee risk sizing
qty = position_size(capital=500_000, entry=120, stop=115, risk_pct=1.0)

# Stop variants — pick one to fit the setup
sized = (bhav
         .pipe(add_atr)
         .pipe(add_chandelier_stop, period=22, mult=3.0)   # rolling-high anchored
         .pipe(add_trailing_atr_stop, period=14, mult=3.0) # never moves down
         .pipe(add_swing_low_stop,   window=10))           # pure price-action

# Vol-targeted sizing: each position contributes equal daily rupee P&L vol
# (qty = capital * target_daily_vol_pct / 100 / atr)
sized = sized.pipe(add_vol_target_size, capital=500_000,
                   target_daily_vol_pct=0.5)
```

### Portfolio-level allocation (`swing.portfolio`)

```python
from nsefast.swing import (
    portfolio_size, correlation_clusters,
    add_relative_strength, add_rs_score,
)

# 1. Rank by relative strength and take the top names
ranked = (history_df
          .pipe(add_relative_strength, benchmark_df, lookback=20)
          .pipe(add_rs_score, rs_col="rs_20"))
picks  = (ranked.sort("rs_20_score", descending=True)
                .head(15)
                .join(symbol_meta, on="symbol", how="left"))   # adds 'sector'

# 2. Cluster correlated names so TCS/INFY/WIPRO is one bet, not three
clusters = correlation_clusters(history_df,
                                symbols=picks["symbol"].to_list(),
                                lookback=60, threshold=0.7)
picks    = picks.join(clusters, on="symbol", how="left")

# 3. Water-fill allocation under single / sector / cluster caps.
#    Residual is redistributed to names with headroom — no avoidable cash drag.
book = portfolio_size(
    picks, capital=10_00_000,
    max_positions=10,
    max_single_pct=10.0,        # no name > 10% of capital
    max_sector_pct=30.0,        # no sector > 30%
    max_cluster_pct=20.0,       # no correlation cluster > 20%
)
# Output columns: symbol, weight, allocated_pct, allocated_rs, capped_by
```

### Walk-forward backtest

```python
# Minimal walk-forward backtest (full engine + ML lands in v0.3.0)
from nsefast.swing.backtest import run_backtest, summary_stats
trades = run_backtest(history_df, signal_fn=lambda d: d["close"] > d["close"].shift(20),
                      holding_days=5)
summary_stats(trades)
```

## Documentation

- [`docs/USAGE.md`](docs/USAGE.md) — full Python + CLI usage, canonical schemas, polite-use rules
- [`docs/PUBLISHING.md`](docs/PUBLISHING.md) — how to release new versions to PyPI
- [`CHANGELOG.md`](CHANGELOG.md) — version history

## Failure semantics

Every public collector returns a Polars DataFrame with its canonical
schema on **any** failure (invalid input, network error, malformed
payload, polars error, robots block). Collectors **never raise** — your
pipelines stay crash-proof.

## Tests

```bash
pytest -q     # 213 unit tests, no network calls
```

## License

MIT — see [`LICENSE`](LICENSE)
