Metadata-Version: 2.4
Name: svd-imputer
Version: 0.1.0
Summary: SVD-based time series imputation with uncertainty estimation
Author: Rui Hugman
Maintainer: Rui Hugman
License-Expression: MIT
Project-URL: Homepage, https://github.com/rhugman/svd_imputer
Project-URL: Documentation, https://svd-imputer.readthedocs.io/
Project-URL: Repository, https://github.com/rhugman/svd_imputer.git
Project-URL: Bug Tracker, https://github.com/rhugman/svd_imputer/issues
Keywords: time-series,imputation,svd,missing-data,uncertainty,monte-carlo
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.20.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: scikit-learn>=1.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-xdist>=3.0.0; extra == "dev"
Requires-Dist: pytest-mock>=3.10.0; extra == "dev"
Requires-Dist: pytest-benchmark>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: bandit>=1.7.5; extra == "dev"
Requires-Dist: safety>=2.3.0; extra == "dev"
Requires-Dist: pre-commit>=3.0.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=6.0.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.2.0; extra == "docs"
Requires-Dist: nbsphinx>=0.9.0; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints>=1.20.0; extra == "docs"
Requires-Dist: sphinx-copybutton>=0.5.0; extra == "docs"
Provides-Extra: test
Requires-Dist: pytest>=7.0.0; extra == "test"
Requires-Dist: pytest-cov>=4.0.0; extra == "test"
Requires-Dist: pytest-xdist>=3.0.0; extra == "test"
Requires-Dist: pytest-mock>=3.10.0; extra == "test"
Requires-Dist: pytest-benchmark>=4.0.0; extra == "test"
Provides-Extra: security
Requires-Dist: bandit>=1.7.5; extra == "security"
Requires-Dist: safety>=2.3.0; extra == "security"
Requires-Dist: pip-audit>=2.6.0; extra == "security"
Provides-Extra: all
Requires-Dist: svd-imputer[dev,docs,security,test]; extra == "all"

# SVD Time Series Imputer

A Python package for time series imputation using Singular Value Decomposition (SVD) with automatic rank estimation and uncertainty quantification.

## Table of Contents
- [Installation](#installation)  
- [Quick Start](#quick-start)
- [Usage](#usage)
- [Examples](#examples)
- [API Reference](#api-reference)
- [Requirements](#requirements)

A Python package for time series imputation using SVD with automatic rank estimation, uncertainty quantification, and scikit-learn compatible API.

## Installation

Install from source (development version):
```bash
pip install -e .
```

Install with development dependencies:
```bash
pip install -e ".[dev]"
```

## Quick Start

```python
import pandas as pd
import numpy as np
from svd_imputer import Imputer

# Load your time series data (with datetime index)
df = pd.read_csv("your_data.csv", index_col=0, parse_dates=True)

# Simple imputation with automatic rank estimation
imputer = Imputer(data=df, variance_threshold=0.95)
df_imputed = imputer.fit_transform()

# With uncertainty estimation  
df_imputed, uncertainty = imputer.fit_transform(return_uncertainty=True)
print(f"RMSE: {uncertainty['rmse']:.3f} ± {uncertainty['rmse_std']:.3f}")
```

> **Note**: The `Imputer` class uses a data-centric design where data is provided at initialization and preprocessed once. This ensures consistency across all analyses and eliminates redundant preprocessing operations.

## Usage

```python
from svd_imputer import Imputer

# Basic imputation (automatic rank estimation)
imputer = Imputer(data=df, variance_threshold=0.95)
df_imputed = imputer.fit_transform()

# Cross-validation optimization
imputer = Imputer(data=df, rank="auto")
imputer.fit()
print(f"Optimized rank: {imputer.rank_}")

# With uncertainty estimation
df_imputed, uncertainty = imputer.fit_transform(return_uncertainty=True)
print(f"RMSE: {uncertainty['rmse']:.3f} ± {uncertainty['rmse_std']:.3f}")

# Advanced: model diagnostics
residuals, stats = imputer.calculate_reconstruction_residuals(return_stats=True)
print(f"Reconstruction R²: {stats['r_squared']:.3f}")
```

## Configuration

```python
imputer = Imputer(
    data=df,                    # Input DataFrame (required)
    variance_threshold=0.95,    # Variance threshold for auto rank estimation
    rank=None,                  # None (auto-estimate), int (fixed), or "auto" (optimize)
    max_iters=500,             # Maximum SVD iterations
    tol=1e-4,                  # Convergence tolerance  
    verbose=True               # Enable logging output
)
```


## Examples

Complete examples are available in the `examples/` directory:
- `basic_example.ipynb` - Basic usage and quick start tutorial
- `augmented_example.ipynb` - Extended examples with data agumentation features

## How It Works

The algorithm performs iterative SVD imputation with automatic rank estimation:

1. **Preprocessing**: Data validation, standardization, and missing value handling
2. **Rank Estimation**: Variance threshold, cross-validation, or fixed rank
3. **SVD Imputation**: Iterative low-rank approximation until convergence
4. **Uncertainty Estimation**: Monte Carlo validation with temporal or random masking

## API Reference

### Main Class
`Imputer(data, variance_threshold=0.95, rank=None, max_iters=500, tol=1e-4, verbose=True)`

### Key Methods
- `fit()` / `transform()` / `fit_transform()`: Standard sklearn interface
- `estimate_uncertainty()`: Monte Carlo validation
- `calculate_reconstruction_residuals()`: Model diagnostics
- `project_data()` / `reconstruct_data()`: SVD subspace operations

## Requirements

- Python >= 3.8
- numpy >= 1.20.0
- pandas >= 1.3.0
- scikit-learn >= 1.0.0

## Performance Notes

- **Memory**: O(n × m) for data size n×m, plus O(min(n,m)²) for SVD decomposition
- **Time Complexity**: O(k × min(n,m)³) where k is the number of SVD iterations  
- **Recommended Scale**: Efficient for datasets up to ~10,000 × 100 dimensions
- **Optimization**: SVD components are cached for efficient reuse across operations

## Development Status

This package is currently very much in **Beta** User beware!

## Disclaimer

**IMPORTANT**: This software is provided "as is" without warranty of any kind. The authors and contributors make no representations or warranties regarding the accuracy, completeness, or validity of the code or its results. Users are solely responsible for validating the appropriateness and correctness of this software for their specific use cases. The authors assume no responsibility or liability for any errors, omissions, or damages arising from the use of this software.

## License

MIT License

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## Citation

If you use this package in your research, please cite:

```bibtex
@software{svd_time_series_imputer,
  title={SVD Time Series Imputer: A Python Package for Missing Data Imputation},
  author={Rui Hugman},
  year={2025},
  url={https://github.com/rhugman/svd_imputer}
}
```
