Metadata-Version: 2.4
Name: jse-tools
Version: 0.3.0
Summary: Robust institutional-grade ingestion toolkit for JSE equity data
Author: Helmie Research
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: requests
Requires-Dist: yfinance

# jse-tools

Robust institutional-grade ingestion and validation toolkit for Johannesburg Stock Exchange (JSE) equity data.

---

## Overview

`jse-tools` is a lightweight but structured data engineering toolkit designed for quantitative equity research workflows focused on JSE-listed securities.

It provides:

- JSE ticker discovery via EOD Historical Data (EODHD) API
- Network resilience with retry logic
- Dataset integrity validation guardrails
- Historical OHLCV ingestion via Yahoo Finance
- Z-score–based statistical outlier filtering
- Structured CSV export pipeline

This package is built for:

- Quantitative researchers
- Systematic strategy developers
- Portfolio engineers
- Academic finance projects
- Institutional-style backtesting workflows

---

## Installation (TestPyPI)

```bash
pip install -i https://test.pypi.org/simple/ jse-tools
```

When released to production PyPI:

```bash
pip install jse-tools
```

---

## Quick Start

```python
from jse_tools.core import get_tickers, download_and_process, save_tickers

API_TOKEN = "YOUR_EODHD_TOKEN"

# Step 1: Fetch JSE tickers
tickers = get_tickers(API_TOKEN)

# Step 2: Download and clean historical data
download_and_process(
    tickers=tickers[:10],
    start_date="2015-01-01",
    end_date="2024-01-01",
    output_dir="Stocks"
)

# Step 3: Save consolidated ticker list
save_tickers("Stocks")
```

---

## Architecture

The package follows a layered defensive data engineering approach.

---

### 1. Resilience Layer

`safe_request(url, retries=3, delay=2)`

Provides retry logic for unstable API connections.

Mitigates:
- Temporary server failures
- Network instability
- Timeout errors

Ensures ingestion pipelines do not fail due to transient connectivity issues.

---

### 2. Reliability Layer

`verify_dataset_integrity(df, required_columns=None)`

Enforces strict dataset quality constraints:

- DataFrame must not be empty
- Required columns must exist
- No column may exceed 50% missing data
- Detects structurally corrupted datasets

Raises descriptive `ValueError` exceptions on failure.

This prevents polluted data from propagating into downstream models.

---

### 3. Data Pipeline Functions

#### get_tickers(api_token)

Downloads JSE-listed instruments from EODHD and formats them for Yahoo Finance (.JO suffix).

Returns:
```
list[str]
```

Example output:
```
['AGL.JO', 'NPN.JO', 'SOL.JO', ...]
```

---

#### download_and_process(tickers, start_date, end_date, output_dir="Stocks")

Downloads historical OHLCV data using `yfinance`.

Processing steps:

1. Downloads adjusted price data
2. Keeps required OHLCV columns
3. Validates dataset integrity
4. Removes statistical outliers using |Z-score| < 3
5. Forward-fills and backward-fills missing values
6. Exports each ticker to CSV

Output structure:

```
Stocks/
    AGL.JO.csv
    NPN.JO.csv
    ...
```

---

#### save_tickers(folder_path)

Scans a folder for CSV files and generates:

```
tickers.csv
```

Returns:
```
pandas.DataFrame
```

---

## Statistical Filtering

Outlier removal is performed using:

```
|Z-score| < 3
```

Applied row-wise across OHLCV columns.

This reduces:
- Data spikes
- Erroneous price prints
- Corrupted observations

Improves robustness of:

- Volatility calculations
- Factor models
- Backtesting systems
- Risk analytics

---

## Defensive Data Design Philosophy

The package follows three engineering principles:

1. Fail early on corrupted datasets
2. Retry transient failures automatically
3. Clean statistical anomalies before storage

This design reduces downstream model fragility.

---

## Dependencies

- pandas
- numpy
- scipy
- requests
- yfinance

---

## Versioning

Semantic versioning is followed:

- PATCH (0.2.x): Documentation & minor fixes
- MINOR (0.x.0): Feature additions
- MAJOR (x.0.0): Breaking changes

---

## Roadmap

Planned institutional enhancements:
PostgreSQL export adapter
Corporate actions reconciliation
Market-cap & sector enrichment
Structured metrics dashboard
Performance analytics module
Cloud storage backends (S3, GCS)
CLI entrypoint
YAML-based configuration layer

---

## License

MIT License

---

## Disclaimer

This package is for research and educational purposes only.
It does not constitute financial advice.
