Metadata-Version: 2.4
Name: trashpandas
Version: 1.0.1
Summary: Persistent Pandas DataFrame storage and retrieval using a SQL database, HDF5, CSV files, or pickle files.
Author-email: Odos Matthews <odos@example.com>
License: MIT
Project-URL: Homepage, https://github.com/eddiethedean/trashpandas
Project-URL: Repository, https://github.com/eddiethedean/trashpandas
Project-URL: Issues, https://github.com/eddiethedean/trashpandas/issues
Project-URL: Documentation, https://github.com/eddiethedean/trashpandas#readme
Keywords: pandas,dataframe,storage,sql,hdf5,csv,pickle
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: SQLAlchemy>=2.0.0
Requires-Dist: pandas>=1.3.0
Provides-Extra: hdf5
Requires-Dist: h5py>=3.0.0; extra == "hdf5"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: tox>=4.0.0; extra == "dev"
Requires-Dist: pre-commit>=3.0.0; extra == "dev"
Requires-Dist: h5py>=3.0.0; extra == "dev"
Provides-Extra: all
Requires-Dist: trashpandas[dev,hdf5]; extra == "all"
Dynamic: license-file

![TrashPandas Logo](https://raw.githubusercontent.com/eddiethedean/trashpandas/main/docs/trashpanda.svg)

# TrashPandas: Persistent Pandas DataFrame Storage and Retrieval

[![PyPI Latest Release](https://img.shields.io/pypi/v/trashpandas.svg)](https://pypi.org/project/trashpandas/)
[![Tests](https://github.com/eddiethedean/trashpandas/actions/workflows/tests.yml/badge.svg)](https://github.com/eddiethedean/trashpandas/actions/workflows/tests.yml)
[![Python Support](https://img.shields.io/pypi/pyversions/trashpandas.svg)](https://pypi.org/project/trashpandas/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Code style: ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)

## What is it?

**TrashPandas** is a modern Python package that provides persistent Pandas DataFrame storage and retrieval using SQL databases, CSV files, HDF5, or pickle files. Version 1.0.0 brings significant improvements including SQLAlchemy 2.x support, comprehensive type hints, modern Python features, and enhanced error handling.

## ✨ Main Features

- **Multiple Storage Backends**: SQL databases, CSV files, HDF5, and pickle files
- **Preserve Data Integrity**: Maintains indexes and data types during storage/retrieval
- **Format Conversion**: Transfer DataFrames between different storage formats
- **Modern Python Support**: Full type hints, context managers, and iterator protocol
- **Bulk Operations**: Efficient batch processing with `store_many()`, `load_many()`, `delete_many()`
- **Compression Support**: Optional compression for CSV and pickle storage
- **Comprehensive Error Handling**: Custom exception hierarchy with detailed error messages
- **SQLAlchemy 2.x**: Full support for the latest SQLAlchemy with async capabilities

## 🚀 Quick Start

### Installation

```bash
# Basic installation
pip install trashpandas

# With HDF5 support
pip install trashpandas[hdf5]

# Development dependencies
pip install trashpandas[dev]
```

### Basic Usage

```python
import pandas as pd
import sqlalchemy as sa
import trashpandas as tp

# Create sample data
df = pd.DataFrame({'name': ['Joe', 'Bob', 'John'], 'age': [23, 34, 44]})

# SQL Storage
with tp.SqlStorage('sqlite:///test.db') as storage:
    storage['people'] = df
    loaded_df = storage['people']
    print(f"Stored {len(storage)} tables")

# CSV Storage with compression
csv_storage = tp.CsvStorage('./data', compression='gzip')
csv_storage.store(df, 'people')

# Pickle Storage
pickle_storage = tp.PickleStorage('./pickles', compression='bz2')
pickle_storage.store(df, 'people')
```

## 📖 Example Notebooks

Check out these interactive Jupyter notebooks demonstrating TrashPandas features:

- **[Basic Usage](https://github.com/eddiethedean/trashpandas/blob/main/examples/01_basic_usage.ipynb)** - Introduction to CSV, SQL, and Pickle storage
- **[Advanced Features](https://github.com/eddiethedean/trashpandas/blob/main/examples/02_advanced_features.ipynb)** - Compression, bulk operations, and data type preservation
- **[Format Conversion](https://github.com/eddiethedean/trashpandas/blob/main/examples/03_format_conversion.ipynb)** - Converting DataFrames between different storage formats
- **[Query Capabilities](https://github.com/eddiethedean/trashpandas/blob/main/examples/04_query_capabilities.ipynb)** - Advanced SQL querying with WHERE clauses and filtering

All notebooks are fully executed with outputs included. Click the links above to view them on GitHub or open them in Jupyter Notebook/Lab.

## 📚 API Reference

### Storage Classes

#### SqlStorage
```python
# Create SQL storage
storage = tp.SqlStorage('sqlite:///test.db')
# or with existing engine
engine = sa.create_engine('sqlite:///test.db')
storage = tp.SqlStorage(engine)

# Basic operations
storage.store(df, 'table_name')
df = storage.load('table_name')
storage.delete('table_name')

# Dictionary-like interface
storage['table_name'] = df
df = storage['table_name']
del storage['table_name']

# Bulk operations
storage.store_many({'table1': df1, 'table2': df2})
results = storage.load_many(['table1', 'table2'])
storage.delete_many(['table1', 'table2'])

# Context manager
with storage:
    storage['data'] = df
```

#### CsvStorage
```python
# Basic CSV storage
storage = tp.CsvStorage('./data')

# With compression
storage = tp.CsvStorage('./data', compression='gzip')

# Operations
storage.store(df, 'table_name')
df = storage.load('table_name')
```

#### PickleStorage
```python
# Basic pickle storage
storage = tp.PickleStorage('./pickles')

# With custom extension and compression
storage = tp.PickleStorage('./pickles', file_extension='.pkl', compression='bz2')

# Operations
storage.store(df, 'table_name')
df = storage.load('table_name')
```

#### HdfStorage (Optional)
```python
# Requires: pip install trashpandas[hdf5]
storage = tp.HdfStorage('data.h5')
storage.store(df, 'table_name')
df = storage.load('table_name')
```

### Modern Features

#### Iterator Protocol
```python
storage = tp.SqlStorage('sqlite:///test.db')

# Iterate over table names
for table_name in storage:
    print(f"Table: {table_name}")

# Check if table exists
if 'my_table' in storage:
    df = storage['my_table']

# Get number of tables
print(f"Total tables: {len(storage)}")
```

#### Context Managers
```python
# Automatic resource cleanup
with tp.SqlStorage('sqlite:///test.db') as storage:
    storage['data'] = df
    # Connection automatically closed
```

#### Bulk Operations
```python
# Store multiple DataFrames efficiently
dataframes = {
    'users': users_df,
    'orders': orders_df,
    'products': products_df
}
storage.store_many(dataframes)

# Load multiple tables
tables = ['users', 'orders', 'products']
results = storage.load_many(tables)

# Delete multiple tables
storage.delete_many(tables)
```

#### Compression Support
```python
# CSV with compression
csv_storage = tp.CsvStorage('./data', compression='gzip')

# Pickle with compression
pickle_storage = tp.PickleStorage('./pickles', compression='bz2')

# Supported compression types: 'gzip', 'bz2', 'xz', 'zstd'
```

### Error Handling

```python
from trashpandas.exceptions import TableNotFoundError, MetadataCorruptedError

try:
    df = storage.load('nonexistent_table')
except TableNotFoundError as e:
    print(f"Table not found: {e.table_name}")
except MetadataCorruptedError as e:
    print(f"Metadata corrupted: {e.details}")
```

## 🔄 Migration from 0.x to 1.0

### Breaking Changes

1. **SQLAlchemy 2.x Required**: Update your SQLAlchemy version
   ```bash
   pip install "SQLAlchemy>=2.0.0"
   ```

2. **Path Parameters**: Storage classes now accept `pathlib.Path` objects
   ```python
   # Old
   storage = tp.CsvStorage('/path/to/data')
   
   # New (still works)
   storage = tp.CsvStorage('/path/to/data')
   
   # New (recommended)
   from pathlib import Path
   storage = tp.CsvStorage(Path('/path/to/data'))
   ```

3. **Method Signatures**: Some internal methods have updated signatures
   ```python
   # Old
   storage.store(df, 'table')
   
   # New (backward compatible)
   storage.store(df, 'table')
   storage.store(df, 'table', schema='my_schema')  # New optional parameter
   ```

### New Features

1. **Context Managers**: Use `with` statements for automatic cleanup
2. **Iterator Protocol**: Iterate over storage objects
3. **Bulk Operations**: Efficient batch processing
4. **Compression**: Optional compression for file-based storage
5. **Better Error Handling**: Comprehensive exception hierarchy

## 🛠️ Development

### Setup Development Environment

```bash
git clone https://github.com/eddiethedean/trashpandas.git
cd trashpandas
pip install -e ".[dev]"
```

### Running Tests

```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=trashpandas

# Run specific test file
pytest tests/test_sql.py
```

### Code Quality

```bash
# Linting with ruff
ruff check src tests

# Type checking with mypy
mypy src

# Format code
ruff format src tests
```

## 📋 Requirements

- Python 3.8+
- pandas >= 1.3.0
- SQLAlchemy >= 2.0.0
- h5py >= 3.0.0 (optional, for HDF5 support)

## 🤝 Contributing

Contributions are welcome! Please see our [Contributing Guide](https://github.com/eddiethedean/trashpandas/blob/main/CONTRIBUTING.md) for details.

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](https://github.com/eddiethedean/trashpandas/blob/main/LICENSE) file for details.

## 🙏 Acknowledgments

- [pandas](https://pandas.pydata.org/) for the excellent DataFrame library
- [SQLAlchemy](https://www.sqlalchemy.org/) for robust database connectivity
- [h5py](https://docs.h5py.org/) for HDF5 support
- The Python community for inspiration and feedback

---

**TrashPandas** - Making DataFrame persistence simple and reliable! 🐼
