Metadata-Version: 2.4
Name: srx-lib-ml
Version: 0.1.0
Summary: Reusable analytics and ML utilities for transport risk detection and streaming dashboards.
Author: SRX Labs
License: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=2.0
Requires-Dist: numpy>=1.24
Requires-Dist: scikit-learn>=1.3
Provides-Extra: viz
Requires-Dist: streamlit>=1.30; extra == "viz"
Provides-Extra: geo
Requires-Dist: pyproj>=3.6; extra == "geo"
Provides-Extra: dev
Requires-Dist: ruff>=0.5.0; extra == "dev"
Requires-Dist: streamlit>=1.30; extra == "dev"
Requires-Dist: pyproj>=3.6; extra == "dev"
Dynamic: license-file

# srx-lib-ml

Reusable, production-grade pandas and scikit-learn utilities extracted from multiple SRX analytics apps. The library is domain-agnostic: pass your column names, not ours. Focus areas:
- fast data loading and feature engineering for trip/order/event data
- anomaly detection and clustering using robust defaults
- risk profiling across vehicles/assets, routes, and temporal dimensions
- optional Streamlit-friendly helpers for dashboards

## Quick start
```bash
pip install -e ./srx-lib-ml
```

```python
import pandas as pd
from srx_lib_ml import features, anomaly, geo, risk, routes

# 1) Normalize your columns to a canonical schema
mapping = {
    "Start Time": "start_time",
    "Stop Time": "end_time",
    "Distance (Km)": "distance_km",
    "Avg Speed": "avg_speed_kmh",
    "StartLat": "start_lat",
    "StartLon": "start_lon",
    "StopLat": "stop_lat",
    "StopLon": "stop_lon",
}
df = features.standardize_columns(pd.read_csv("journeys.csv"), mapping)

# 2) Feature engineering + anomaly scoring
df = features.enrich_journey_frame(df)
df = anomaly.detect_isolation_forest(df)

# 3) Geo clustering (DBSCAN in degrees)
df = geo.assign_dbscan_clusters(df, lat_col="start_lat", lon_col="start_lon", label_col="start_cluster")

# 4) Route/vehicle rollups with your chosen ids
vehicle_profile = risk.vehicle_risk_profile(df, vehicle_id_col="VID", vehicle_name_col="VName", distance_col="distance_km")
route_profile = risk.route_risk_analysis(df, start_label_col="Start Location", stop_label_col="Stop Location", vehicle_id_col="VID", distance_col="distance_km")

# Procurement-style routing (haversine) with fully custom columns
orders = routes.parse_latlon_column(pd.read_csv("orders.csv"), source_col="Pickup Location", lat_col="pickup_lat", lon_col="pickup_lon")
orders["pickup_zone"] = routes.dbscan_haversine(orders.dropna(subset=["pickup_lat", "pickup_lon"]), lat_col="pickup_lat", lon_col="pickup_lon")
orders = routes.add_route_pairs(orders, origin_col="pickup_zone", dest_col="dropoff_zone", route_col="route_id")
perf = routes.actor_route_performance(
    orders,
    route_col="route_id",
    actor_col="Partner",
    id_col="External Id",
    success_flag_col="Pickup Actual Time",
    distance_col="Distance (KM)",
)
```

The modules stay parameterized so they can be reused across transport, procurement, logistics, or other journey/order/event datasets—pass your own column names to avoid recoding.

## Module guide
- `features`
  - `standardize_columns(df, mapping)`: rename columns into a canonical schema.
  - `ensure_columns(df, required)`: add missing columns as NaN.
  - `enrich_journey_frame(df, ...)`: derive durations, speed deviation, rule-based flags (long stop, slow, zero distance, high deviation).
  - `time_category(hour)`: shared time bucketer.
- `anomaly`
  - `detect_isolation_forest(df, config=None, use_enhanced_features=False, feature_override=None)`: add anomaly scores/flags.
- `geo`
  - `haversine_distance(lat1, lon1, lat2, lon2)`: km distance.
  - `assign_dbscan_clusters(df, lat_col, lon_col, label_col="cluster", config=None)`.
  - `assign_kmeans_zones(df, lat_col, lon_col, label_col="zone", config=None)`.
- `location_zones`
  - `apply_location_risk_zones(df, location_sheets, lat_col="latitude_deg", lon_col="longitude_deg", radius_km=0.5, risk_col="risk_score", start_lat_col="start_lat", start_lon_col="start_lon", stop_lat_col="stop_lat", stop_lon_col="stop_lon", zone_score_mapping=None)`.
- `risk`
  - `vehicle_risk_profile(df, vehicle_id_col="vehicle_id", vehicle_name_col="vehicle_name", risk_col="risk_score", anomaly_flag_col="is_anomaly", anomaly_score_col="anomaly_score_normalized", distance_col="distance_km", ...)`.
  - `route_risk_analysis(df, start_label_col="start_label", stop_label_col="stop_label", journey_id_col="journey_id", vehicle_id_col="vehicle_id", risk_col="risk_score", anomaly_flag_col="is_anomaly", distance_col=None, min_journeys=3)`.
- `temporal`
  - `temporal_breakdown(df, risk_col="risk_score", anomaly_flag_col="is_anomaly", journey_id_col="journey_id")`: hourly/daily/time-category aggregates.
- `routes`
  - `parse_latlon_column(df, source_col, lat_col, lon_col)`.
  - `dbscan_haversine(df, lat_col, lon_col, eps_km=5.0, min_samples=5)`.
  - `add_route_pairs(df, origin_col, dest_col, route_col="route_id")`.
  - `actor_route_performance(df, route_col, actor_col, id_col, success_flag_col, distance_col, ontime_flag_col=None, min_orders=5)`.
  - `route_complexity_breakdown(df, distance_col, origin_zone_col, dest_zone_col, id_col, ontime_flag_col=None)`.
  - `reallocation_recommendations(perf_df, route_col="route_id", actor_col="actor", success_col="success_rate_pct", total_orders_col="total_orders", min_orders=10, min_gap=15.0)`.
  - `describe_clusters(labels)`: basic cluster stats.
- `viz` (install with extra `viz` → `pip install -e ./srx-lib-ml[viz]`)
  - `hero_metric(label, value, delta=None, help_text=None, cols=3, col_idx=0)`.
  - `hero_card(label, value, subtext=None, help_text=None, background="#0f172a", text_color="#e2e8f0", cols=3, col_idx=0)`.
  - `card_container(header, help_text=None, background="#0f172a", text_color="#e2e8f0", border_color="#1f2937", padding="16px", radius="12px", body=None)`: returns a body container for children.
  - `badge(label, color="#2563eb", text_color="#ffffff", padding="4px 10px")`: returns HTML string.
  - `alert_box(message, tone="info")`: info/success/warning/danger banner.
  - `section_header(title, description=None, divider=True)`.
