Metadata-Version: 2.4
Name: additory
Version: 0.1.1a5
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Rust
Requires-Dist: polars>=0.44.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: pyarrow>=14.0.0
Requires-Dist: scipy>=1.10.0
Requires-Dist: pytest>=7.0 ; extra == 'dev'
Requires-Dist: maturin>=1.0 ; extra == 'dev'
Requires-Dist: hypothesis>=6.0.0 ; extra == 'dev'
Provides-Extra: dev
Summary: Data augmentation library with Rust-accelerated operations
Keywords: data,augmentation,polars,dataframe,synthetic,rust
Author-email: Additory Team <team@additory.dev>
License: MIT
Requires-Python: >=3.8
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM

# Additory Rust Python Bindings

High-performance Rust implementations of Additory's data augmentation functions with Python bindings.

**Performance**: 2-5x faster than pure Python  
**Compatibility**: 100% API compatible with Additory v0.1.1  
**Python Support**: Python 3.8+ (abi3)

## Features

- ✅ **Zero-copy DataFrame transfer** via Apache Arrow IPC
- ✅ **Automatic backend detection** (pandas/polars)
- ✅ **Graceful fallback** to pure Python if Rust unavailable
- ✅ **Memory efficient** with minimal overhead
- ✅ **Type safe** with Rust's ownership system

## Installation

### From PyPI (when published)

```bash
pip install additory-rust
```

### From Source

```bash
# Install build dependencies
pip install maturin

# Navigate to bindings directory
cd rust-core/additory-py

# Build and install in development mode
maturin develop --release

# Or build a wheel
maturin build --release
```

## Quick Start

The Rust bindings are automatically used when available through the Additory Python wrapper:

```python
import polars as pl
from additory.functions.to import to

# Create sample data
orders = pl.DataFrame({
    "order_id": [1, 2, 3, 4],
    "product_id": [101, 102, 101, 103]
})

products = pl.DataFrame({
    "product_id": [101, 102, 103],
    "price": [10.0, 20.0, 15.0]
})

# Lookup operation (automatically uses Rust if available)
result = to(orders, from_df=products, bring="price", against="product_id")
print(result.df)
```

**Output**:
```
┌──────────┬────────────┬───────┐
│ order_id │ product_id │ price │
├──────────┼────────────┼───────┤
│ 1        │ 101        │ 10.0  │
│ 2        │ 102        │ 20.0  │
│ 3        │ 101        │ 10.0  │
│ 4        │ 103        │ 15.0  │
└──────────┴────────────┴───────┘
```

## Supported Operations

### Lookup
Join DataFrames and bring columns from reference data.

```python
result = to(df, from_df=ref, bring=["col1", "col2"], against="id")
```

### Merge
Combine DataFrames vertically or horizontally.

```python
result = to(df1, from_df=df2, to="@merge", how="vertical")
```

### Sort
Sort DataFrame by specified columns.

```python
result = to(df, to="@sort", by="column", descending=False)
```

### Summarize
Group and aggregate data.

```python
result = to(df, to="@summarize", against="category", 
            aggregations={"sales": "sum", "quantity": "mean"})
```

## Performance Benchmarks

| Operation | Rows   | Rust Time | Python Time | Speedup |
|-----------|--------|-----------|-------------|---------|
| Lookup    | 1k     | 0.020s    | 0.045s      | 2.3x    |
| Lookup    | 10k    | 0.003s    | 0.012s      | 4.0x    |
| Lookup    | 100k   | 0.015s    | 0.055s      | 3.7x    |
| Sort      | 10k    | 0.002s    | 0.008s      | 4.0x    |
| Sort      | 100k   | 0.020s    | 0.080s      | 4.0x    |

## Pandas Compatibility

Works seamlessly with pandas DataFrames:

```python
import pandas as pd

orders_pd = pd.DataFrame({
    "order_id": [1, 2, 3],
    "product_id": [101, 102, 101]
})

products_pd = pd.DataFrame({
    "product_id": [101, 102],
    "price": [10.0, 20.0]
})

# Automatic conversion and Rust acceleration
result = to(orders_pd, from_df=products_pd, bring="price", against="product_id")
# Result is also pandas DataFrame
```

## Checking Rust Availability

```python
from additory.functions.to import RUST_AVAILABLE

if RUST_AVAILABLE:
    print("🦀 Rust acceleration enabled!")
    import additory_rust
    print(f"Version: {additory_rust.__version__}")
else:
    print("🐍 Using pure Python implementation")
```

## Direct Rust API (Advanced)

For advanced users who want to bypass the Python wrapper:

```python
import additory_rust
import polars as pl
import io

# Convert DataFrame to Arrow IPC bytes
def df_to_bytes(df):
    buffer = io.BytesIO()
    df.write_ipc(buffer)
    return buffer.getvalue()

def bytes_to_df(data):
    buffer = io.BytesIO(data)
    return pl.read_ipc(buffer)

# Direct Rust call
df_bytes = df_to_bytes(orders)
from_df_bytes = df_to_bytes(products)

result_bytes = additory_rust.to_lookup(
    df_bytes, from_df_bytes, ["price"], ["product_id"]
)

result_df = bytes_to_df(result_bytes)
```

## Architecture

```
Python DataFrame (pandas/polars)
        ↓
Arrow IPC Serialization
        ↓
Rust Processing (zero-copy)
        ↓
Arrow IPC Deserialization
        ↓
Python DataFrame (original type)
```

## Error Handling

All Rust errors are converted to appropriate Python exceptions:

```python
try:
    result = to(df, from_df=ref, bring="invalid_col", against="id")
except ValueError as e:
    print(e)
    # ValueError: Bring columns not found in reference DataFrame: ['invalid_col'].
    # Available columns: ['id', 'price', 'name']
```

## Development

### Building

```bash
# Debug build
maturin develop

# Release build (optimized)
maturin develop --release

# Build wheel
maturin build --release
```

### Testing

```bash
# Rust unit tests
cargo test

# Python integration tests
python test_phase4_integration.py

# Performance benchmarks
python benchmark_rust_performance.py
```

### Documentation

```bash
# Generate Rust docs
cargo doc --open

# View API documentation
cat API_DOCUMENTATION.md

# View usage examples
cat USAGE_EXAMPLES.md
```

## Platform Support

| Platform | Architecture | Status | Wheel Size |
|----------|-------------|--------|------------|
| Linux    | x86_64      | ✅ Built | 14MB      |
| Linux    | aarch64     | 📝 Documented | -        |
| macOS    | x86_64      | 📝 Documented | -        |
| macOS    | aarch64     | 📝 Documented | -        |
| Windows  | x86_64      | 📝 Documented | -        |

See [MULTI_PLATFORM_BUILD_GUIDE.md](MULTI_PLATFORM_BUILD_GUIDE.md) for build instructions.

## Troubleshooting

See [TROUBLESHOOTING.md](../../TROUBLESHOOTING.md) for common issues and solutions.

## Documentation

- [API Documentation](API_DOCUMENTATION.md) - Detailed function reference
- [Usage Examples](../../USAGE_EXAMPLES.md) - Real-world examples
- [Multi-Platform Build Guide](MULTI_PLATFORM_BUILD_GUIDE.md) - Building for different platforms
- [Troubleshooting Guide](../../TROUBLESHOOTING.md) - Common issues and solutions

## Contributing

Contributions welcome! Please ensure:
- All tests pass (`cargo test` and Python tests)
- Code is formatted (`cargo fmt`)
- No clippy warnings (`cargo clippy`)
- Documentation is updated

## License

MIT License - see LICENSE file for details

## Version

**Current Version**: 0.2.0  
**Last Updated**: 2025-02-04  
**Python Support**: 3.8+  
**Polars Version**: 0.44+

