SafeFeat
Leakage-safe, point-in-time feature engineering for event logs.
SafeFeat is a Python library for building machine learning features from event data without data leakage. It ensures that features are computed using only information available at prediction time.
Why SafeFeat?
- Point-in-time correctness: Automatically enforces temporal ordering to prevent leakage
- Audit trails: Track which events were included, excluded, and why
- Clean API: Declarative specs for reproducible feature engineering
- Fast: Vectorized pandas operations
Quick Example
from safefeat import build_features, WindowAgg
features, report = build_features(
spine=spine,
tables={"events": events},
spec=[
WindowAgg(
table="events",
windows=["7D", "30D"],
metrics={
"*": ["count"],
"amount": ["sum", "mean"],
},
)
],
event_time_cols={"events": "event_time"},
return_report=True,
)
print(report.tables["events"])
Key Concepts
- Spine: DataFrame with entity IDs and cutoff times (when to look "as of")
- Events: Time-series data with entity ID, event time, and attributes
- WindowAgg: Count/sum/mean events in rolling time windows
- RecencyBlock: Time since last event (optionally filtered)
- Point-in-Time: Features only use data before the cutoff time
- Audit Report: Tracks joined, kept, and dropped event pairs
Installation
pip install safefeat
# With dev tools (pytest, ruff)
pip install safefeat[dev]