Metadata-Version: 2.4
Name: svd-imputer
Version: 0.1.2
Summary: SVD-based time series imputation with uncertainty estimation
Author: Rui Hugman
Maintainer: Rui Hugman
License-Expression: MIT
Project-URL: Homepage, https://github.com/rhugman/svd_imputer
Project-URL: Documentation, https://svd-imputer.readthedocs.io/
Project-URL: Repository, https://github.com/rhugman/svd_imputer.git
Project-URL: Bug Tracker, https://github.com/rhugman/svd_imputer/issues
Keywords: time-series,imputation,svd,missing-data,uncertainty,monte-carlo
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.20.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: scikit-learn>=1.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-xdist>=3.0.0; extra == "dev"
Requires-Dist: pytest-mock>=3.10.0; extra == "dev"
Requires-Dist: pytest-benchmark>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: bandit>=1.7.5; extra == "dev"
Requires-Dist: safety>=2.3.0; extra == "dev"
Requires-Dist: pre-commit>=3.0.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=6.0.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.2.0; extra == "docs"
Requires-Dist: nbsphinx>=0.9.0; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints>=1.20.0; extra == "docs"
Requires-Dist: sphinx-copybutton>=0.5.0; extra == "docs"
Provides-Extra: test
Requires-Dist: pytest>=7.0.0; extra == "test"
Requires-Dist: pytest-cov>=4.0.0; extra == "test"
Requires-Dist: pytest-xdist>=3.0.0; extra == "test"
Requires-Dist: pytest-mock>=3.10.0; extra == "test"
Requires-Dist: pytest-benchmark>=4.0.0; extra == "test"
Provides-Extra: security
Requires-Dist: bandit>=1.7.5; extra == "security"
Requires-Dist: safety>=2.3.0; extra == "security"
Requires-Dist: pip-audit>=2.6.0; extra == "security"
Provides-Extra: all
Requires-Dist: svd-imputer[dev,docs,security,test]; extra == "all"

# SVD Time Series Imputer

[![PyPI version](https://badge.fury.io/py/svd-imputer.svg)](https://badge.fury.io/py/svd-imputer)
[![Python versions](https://img.shields.io/pypi/pyversions/svd-imputer.svg)](https://pypi.org/project/svd-imputer/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A Python package for time series imputation using Singular Value Decomposition (SVD) with automatic rank estimation and uncertainty quantification.

**📦 Now available on PyPI**: `pip install svd-imputer`

## Table of Contents
- [Installation](#installation)  
- [Quick Start](#quick-start)
- [Usage](#usage)
- [Examples](#examples)
- [API Reference](#api-reference)
- [Requirements](#requirements)

A Python package for time series imputation using SVD with automatic rank estimation, uncertainty quantification, and scikit-learn compatible API.

## Installation

**PyPI (Recommended)**:
```bash
pip install svd-imputer
```

**From Source** (development version):
```bash
git clone https://github.com/rhugman/svd_imputer.git
cd svd_imputer
pip install -e .
```

**With Development Dependencies**:
```bash
pip install -e ".[dev]"
```

## Quick Start

```python
import pandas as pd
import numpy as np
from svd_imputer import Imputer

# Load your time series data (with datetime index)
df = pd.read_csv("your_data.csv", index_col=0, parse_dates=True)

# Simple imputation with automatic rank estimation
imputer = Imputer(data=df, variance_threshold=0.95)
df_imputed = imputer.fit_transform()

# With uncertainty estimation  
df_imputed, uncertainty = imputer.fit_transform(return_uncertainty=True)
print(f"RMSE: {uncertainty['rmse']:.3f} ± {uncertainty['rmse_std']:.3f}")
```

> **Note**: The `Imputer` class uses a data-centric design where data is provided at initialization and preprocessed once. This ensures consistency across all analyses and eliminates redundant preprocessing operations.

## Usage

```python
from svd_imputer import Imputer

# Basic imputation (automatic rank estimation)
imputer = Imputer(data=df, variance_threshold=0.95)
df_imputed = imputer.fit_transform()

# Cross-validation optimization
imputer = Imputer(data=df, rank="auto")
imputer.fit()
print(f"Optimized rank: {imputer.rank_}")

# With uncertainty estimation
df_imputed, uncertainty = imputer.fit_transform(return_uncertainty=True)
print(f"RMSE: {uncertainty['rmse']:.3f} ± {uncertainty['rmse_std']:.3f}")

# Advanced: model diagnostics
residuals, stats = imputer.calculate_reconstruction_residuals(return_stats=True)
print(f"Reconstruction R²: {stats['r_squared']:.3f}")
```

## Configuration

```python
imputer = Imputer(
    data=df,                    # Input DataFrame (required)
    variance_threshold=0.95,    # Variance threshold for auto rank estimation
    rank=None,                  # None (auto-estimate), int (fixed), or "auto" (optimize)
    max_iters=500,             # Maximum SVD iterations
    tol=1e-4,                  # Convergence tolerance  
    verbose=True               # Enable logging output
)
```


## Examples

Complete examples are available in the `examples/` directory:
- `basic_example.ipynb` - Basic usage and quick start tutorial
- `augmented_example.ipynb` - Extended examples with data agumentation features

## How It Works

The algorithm performs iterative SVD imputation with automatic rank estimation:

1. **Preprocessing**: Data validation, standardization, and missing value handling
2. **Rank Estimation**: Variance threshold, cross-validation, or fixed rank
3. **SVD Imputation**: Iterative low-rank approximation until convergence
4. **Uncertainty Estimation**: Monte Carlo validation with temporal or random masking

## API Reference

### Main Class
`Imputer(data, variance_threshold=0.95, rank=None, max_iters=500, tol=1e-4, verbose=True)`

### Key Methods
- `fit()` / `transform()` / `fit_transform()`: Standard sklearn interface
- `estimate_uncertainty()`: Monte Carlo validation
- `calculate_reconstruction_residuals()`: Model diagnostics
- `project_data()` / `reconstruct_data()`: SVD subspace operations

## Requirements

- Python >= 3.8
- numpy >= 1.20.0
- pandas >= 1.3.0
- scikit-learn >= 1.0.0

## Performance Notes

- **Memory**: O(n × m) for data size n×m, plus O(min(n,m)²) for SVD decomposition
- **Time Complexity**: O(k × min(n,m)³) where k is the number of SVD iterations  
- **Recommended Scale**: Efficient for datasets up to ~10,000 × 100 dimensions
- **Optimization**: SVD components are cached for efficient reuse across operations

## Package Status

**Current Status**: **Published on PyPI** 🎉

This package is currently in **Beta** - the core functionality is stable and tested (86 tests passing), but the API may evolve. Suitable for research and development use.

## Disclaimer

**IMPORTANT**: This software is provided "as is" without warranty of any kind. The authors and contributors make no representations or warranties regarding the accuracy, completeness, or validity of the code or its results. Users are solely responsible for validating the appropriateness and correctness of this software for their specific use cases. The authors assume no responsibility or liability for any errors, omissions, or damages arising from the use of this software.

## License

MIT License

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## Links

- **PyPI Package**: https://pypi.org/project/svd-imputer/
- **Source Code**: https://github.com/rhugman/svd_imputer
- **Issues**: https://github.com/rhugman/svd_imputer/issues

## Citation

If you use this package in your research, please cite:

```bibtex
@software{svd_time_series_imputer,
  title={SVD Time Series Imputer: A Python Package for Missing Data Imputation},
  author={Rui Hugman},
  year={2025},
  url={https://github.com/rhugman/svd_imputer},
  note={Available on PyPI: https://pypi.org/project/svd-imputer/}
}
```
