Metadata-Version: 2.4
Name: time-aware-imputer
Version: 1.0.0
Summary: Time-aware missing data imputation for irregular time series
Home-page: https://github.com/ontedduabhishakereddy/time-aware-imputer
Author: Abhishake Reddy O 
Author-email: Abhishake Reddy O <ontedduabhishakereddy@gmail.com>
Maintainer-email: Abhishake Reddy O <ontedduabhishakereddy@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/ontedduabhishakereddy/time-aware-imputer
Project-URL: Documentation, https://github.com/ontedduabhishakereddy/time-aware-imputer/blob/main/README.md
Project-URL: Repository, https://github.com/ontedduabhishakereddy/time-aware-imputer
Project-URL: Bug Tracker, https://github.com/ontedduabhishakereddy/time-aware-imputer/issues
Project-URL: Changelog, https://github.com/ontedduabhishakereddy/time-aware-imputer/blob/main/CHANGELOG.md
Keywords: time-series,imputation,missing-data,interpolation,data-science,machine-learning,iot,sensors
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.21.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: scipy>=1.7.0
Requires-Dist: scikit-learn>=1.0.0
Requires-Dist: matplotlib>=3.4.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=3.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: flake8>=4.0.0; extra == "dev"
Requires-Dist: mypy>=0.950; extra == "dev"
Requires-Dist: isort>=5.10.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=4.0.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.0.0; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints>=1.12.0; extra == "docs"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# Time-Aware Missing Data Imputer

A Python library for intelligent time-series imputation with irregular intervals. Unlike traditional imputation methods that treat all gaps equally, this library understands that a 1-hour gap is fundamentally different from a 1-minute gap.

## Features

- **Time-Aware Imputation**: Respects temporal structure and irregular time intervals
- **Multiple Interpolation Methods**: Linear, cubic, quadratic, PCHIP, and Akima splines
- **Gap Analysis Tools**: Comprehensive diagnostics for understanding missing data patterns
- **Scikit-learn Compatible**: Familiar `fit`/`transform` API that works with sklearn pipelines
- **Production-Ready**: Fully tested, type-annotated, and formatted with black/mypy/flake8

## Installation
```bash
pip install time-aware-imputer
```

For development:
```bash
git clone https://github.com/ontedduabhishakereddy/time-aware-imputer.git
cd time-aware-imputer
pip install -e ".[dev]"
```

## Quick Start
```python
import pandas as pd
import numpy as np
from time_aware_imputer import SplineImputer, GapAnalyzer

# Create sample data with missing values
df = pd.DataFrame({
    'timestamp': pd.date_range('2024-01-01', periods=100, freq='h'),
    'temperature': np.random.randn(100)
})
df.loc[10:15, 'temperature'] = np.nan
df.loc[50:52, 'temperature'] = np.nan

# Analyze gaps
analyzer = GapAnalyzer()
stats = analyzer.analyze(df)
print(f"Found {stats['temperature']['n_gaps']} gaps")
print(f"Missing: {stats['temperature']['missing_percentage']:.1f}%")

# Visualize gaps
fig = analyzer.plot_gaps(df)

# Impute missing values
imputer = SplineImputer(method='cubic')
df_imputed = imputer.fit_transform(df)

# Check which values were imputed
imputed_mask = imputer.get_imputed_mask()
```

## Core Modules

### 1. TimeAwareImputer (Base Class)

Foundation for all imputation strategies with sklearn-compatible API.
```python
from time_aware_imputer import TimeAwareImputer

# All imputers inherit from this base class
# Provides common functionality:
# - Timestamp validation and parsing
# - Automatic column detection
# - Imputation tracking
```

### 2. SplineImputer

Time-aware spline interpolation for smooth, trend-preserving imputation.
```python
from time_aware_imputer import SplineImputer

# Linear interpolation (fast, simple)
imputer = SplineImputer(method='linear')

# Cubic spline (smooth, default)
imputer = SplineImputer(method='cubic')

# PCHIP (monotonicity-preserving)
imputer = SplineImputer(method='pchip')

# Akima (local interpolation, less oscillation)
imputer = SplineImputer(method='akima')

# Fit and transform
df_imputed = imputer.fit_transform(df)
```

**Parameters:**
- `method`: Interpolation method ('linear', 'cubic', 'quadratic', 'slinear', 'pchip', 'akima')
- `fill_value`: How to handle extrapolation ('extrapolate' or a float)
- `time_column`: Name of timestamp column (default: 'timestamp')
- `value_columns`: List of columns to impute (default: all numeric columns)

### 3. GapAnalyzer

Comprehensive gap analysis and visualization tools.
```python
from time_aware_imputer import GapAnalyzer

analyzer = GapAnalyzer()

# Analyze gaps
stats = analyzer.analyze(df)
print(stats['temperature'])
# {
#     'n_gaps': 2,
#     'total_missing': 9,
#     'missing_percentage': 9.0,
#     'mean_gap_duration': 14400.0,  # seconds
#     'max_gap_duration': 18000.0,
#     'min_gap_duration': 10800.0
# }

# Get summary table
summary = analyzer.get_summary()

# Visualize gaps
fig = analyzer.plot_gaps(df, column='temperature')

# Missing data heatmap (for multivariate data)
fig = analyzer.plot_missing_heatmap(df)
```

## Examples

### Example 1: IoT Sensor Data
```python
import pandas as pd
import numpy as np
from time_aware_imputer import SplineImputer, GapAnalyzer

# Simulate IoT sensor data with irregular timestamps and gaps
timestamps = pd.to_datetime([
    '2024-01-01 00:00:00',
    '2024-01-01 00:15:00',
    '2024-01-01 00:30:00',
    '2024-01-01 01:00:00',  # 30-min gap
    '2024-01-01 01:15:00',
    '2024-01-01 02:00:00',  # 45-min gap
    '2024-01-01 02:15:00',
])

temperature = [20.1, 20.3, np.nan, 21.5, np.nan, 22.0, 22.1]

df = pd.DataFrame({
    'timestamp': timestamps,
    'temperature': temperature
})

# Analyze gaps
analyzer = GapAnalyzer()
stats = analyzer.analyze(df)
print(f"Gaps: {stats['temperature']['n_gaps']}")
print(f"Mean gap duration: {stats['temperature']['mean_gap_duration']/60:.1f} minutes")

# Impute with cubic spline
imputer = SplineImputer(method='cubic')
df_imputed = imputer.fit_transform(df)
print(df_imputed)
```

### Example 2: Multiple Sensors
```python
# Multiple correlated sensors
df = pd.DataFrame({
    'timestamp': pd.date_range('2024-01-01', periods=100, freq='10min'),
    'temperature': np.random.randn(100) * 5 + 20,
    'humidity': np.random.randn(100) * 10 + 60,
    'pressure': np.random.randn(100) * 2 + 1013
})

# Introduce gaps
df.loc[20:25, 'temperature'] = np.nan
df.loc[40:43, 'humidity'] = np.nan
df.loc[70:72, 'pressure'] = np.nan

# Analyze all columns
analyzer = GapAnalyzer()
stats = analyzer.analyze(df)
summary = analyzer.get_summary()
print(summary)

# Impute all columns
imputer = SplineImputer(method='cubic')
df_imputed = imputer.fit_transform(df)
```

### Example 3: Integration with Sklearn Pipeline
```python
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from time_aware_imputer import SplineImputer

# Create pipeline
pipeline = Pipeline([
    ('imputer', SplineImputer(method='cubic')),
    # Note: StandardScaler will work on all numeric columns
    # Need to handle timestamp column separately or drop it
])

# Use in ML workflow
# (Typically you'd separate timestamp column from features first)
```

## API Reference

### TimeAwareImputer

**Methods:**
- `fit(X, y=None)`: Fit the imputer on training data
- `transform(X)`: Transform data by imputing missing values
- `fit_transform(X, y=None)`: Fit and transform in one step
- `get_imputed_mask()`: Get boolean mask of imputed values
- `get_feature_names_out()`: Get output feature names

**Attributes:**
- `is_fitted_`: Whether the imputer has been fitted
- `feature_names_in_`: Names of features seen during fit
- `n_features_in_`: Number of features seen during fit
- `imputed_mask_`: Boolean mask of imputed values

### SplineImputer

Inherits all methods and attributes from `TimeAwareImputer`.

**Additional Attributes:**
- `interpolators_`: Dictionary of fitted interpolator objects per column

### GapAnalyzer

**Methods:**
- `analyze(data)`: Analyze gaps and return statistics
- `get_summary()`: Get summary DataFrame of gap statistics
- `plot_gaps(data, column=None)`: Visualize gaps in time series
- `plot_missing_heatmap(data)`: Create heatmap of missing patterns

**Attributes:**
- `gap_stats_`: Dictionary of gap statistics per column

## Development

### Setup Development Environment
```bash
# Clone repository
git clone https://github.com/ontedduabhishakereddy/time-aware-imputer.git
cd time-aware-imputer

# Install with dev dependencies
pip install -e ".[dev]"
```

### Running Tests
```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=time_aware_imputer --cov-report=html

# Run specific test file
pytest tests/test_spline.py

# Run specific test
pytest tests/test_spline.py::TestSplineImputer::test_fit_transform_linear
```

### Code Quality
```bash
# Format code with black
black time_aware_imputer tests

# Sort imports
isort time_aware_imputer tests

# Type checking with mypy
mypy time_aware_imputer

# Linting with flake8
flake8 time_aware_imputer tests
```

### Running All Quality Checks
```bash
# Format
black time_aware_imputer tests
isort time_aware_imputer tests

# Check
mypy time_aware_imputer
flake8 time_aware_imputer tests

# Test
pytest
```

## Requirements

- Python >= 3.8
- numpy >= 1.21.0
- pandas >= 1.3.0
- scipy >= 1.7.0
- scikit-learn >= 1.0.0
- matplotlib >= 3.4.0

## Use Cases

### IoT & Industrial Monitoring
- Sensor networks with irregular data collection
- Network failures causing data gaps
- Equipment downtime

### Medical Devices
- Continuous glucose monitoring
- Heart rate monitors
- Patient activity trackers

### Financial Markets
- High-frequency trading data
- Tick data with irregular timestamps
- Market data feed interruptions

### Environmental Monitoring
- Weather stations
- Air quality sensors
- Hydrological measurements

## Roadmap

Future enhancements planned:
- Gaussian Process imputation with uncertainty quantification
- Adaptive imputation strategy selection
- Multivariate imputation using correlations
- Seasonal pattern detection and imputation
- Real-time streaming imputation

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request

## License

This project is licensed under the MIT License - see the [LICENSE](https://github.com/ontedduabhishakereddy/time-aware-imputer/blob/main/LICENSE) file for details.

## Citation

If you use this library in your research, please cite:
```bibtex
@software{time_aware_imputer,
  title = {Time-Aware Missing Data Imputer},
  author = {Abhishake Reddy O },
  year = {2026},
  url = {https://github.com/ontedduabhishakereddy/time-aware-imputer}
}
```

## Acknowledgments

- Inspired by the gap between simple imputation (scikit-learn) and complex deep learning approaches
- Built on top of NumPy, SciPy, and scikit-learn
- Designed for real-world time-series challenges

## Contact

- Author: Abhishake Reddy O 
- Email: ontedduabhishakereddy@gmail.com
- GitHub: [@ontedduabhishakereddy](https://github.com/ontedduabhishakereddy)
