Metadata-Version: 2.4
Name: additory
Version: 0.1.0a3
Summary: A semantic, extensible dataframe transformation engine with expressions, lookup, and synthetic data generation support.
Author: Krishnamoorthy Sankaran
License: MIT
Project-URL: homepage, https://github.com/sekarkrishna/additory
Project-URL: documentation, https://github.com/sekarkrishna/additory/tree/main/documentation/V0.1.0
Project-URL: source, https://github.com/sekarkrishna/additory
Project-URL: issues, https://github.com/sekarkrishna/additory/issues
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.5
Requires-Dist: polars>=0.20
Requires-Dist: pyarrow>=10.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: requests>=2.31
Requires-Dist: toml>=0.10
Requires-Dist: scipy>=1.9
Requires-Dist: numpy>=1.21
Requires-Dist: packaging>=21.0
Requires-Dist: psutil>=5.8
Provides-Extra: gpu
Requires-Dist: cudf>=24.0; extra == "gpu"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: pytest-xdist>=3.0; extra == "dev"
Requires-Dist: hypothesis>=6.0; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: coverage>=7.0; extra == "dev"
Dynamic: license-file

# Additory

**A semantic, extensible dataframe transformation engine with expressions, lookup, and augmentation support.**

[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Version](https://img.shields.io/badge/version-0.1.0a2-orange.svg)](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/)

**Author:** Krishnamoorthy Sankaran

## 🛠️ Requirements

- **Python**: 3.9+
- **Core dependencies**: pandas, polars, numpy, scipy
- **Optional**: cuDF (for GPU support)

## 📦 Installation

```bash
pip install additory==0.1.0a2
```

**Optional GPU support:**
```bash
pip install additory[gpu]==0.1.0a2  # Includes cuDF for GPU acceleration
```

**Development installation:**
```bash
pip install additory[dev]==0.1.0a2  # Includes testing and development tools
```

## 🎯 Core Functions

| Function | Purpose | Example |
|----------|---------|---------|
| `add.to()` | Lookup/join operations | `add.to(df1, from_df=df2, bring='col', against='key')` |
| `add.augment()` | Generate additional data | `add.augment(df, n_rows=1000)` |
| `add.scan()` | Data profiling & analysis | `add.scan(df, preset="full")` |

## 🧬 Available Expressions

Additory includes 12 built-in health and fitness expressions:

- **`add.bmi()`** - Body Mass Index
- **`add.bsa()`** - Body Surface Area  
- **`add.bmr()`** - Basal Metabolic Rate
- **`add.waist_hip_ratio()`** - Waist-to-Hip Ratio
- **`add.body_fat_percentage()`** - Body Fat Percentage
- **`add.ideal_body_weight()`** - Ideal Body Weight
- **`add.blood_pressure_category()`** - BP Classification
- **`add.cholesterol_ratio()`** - Cholesterol Ratio
- **`add.age_category()`** - Age Classification
- **`add.fitness_score()`** - Overall Fitness Score

```python
# Health calculations
patients = pd.DataFrame({
    'weight_kg': [70, 80, 65],  # Weight in kilograms
    'height_m': [1.75, 1.80, 1.60],  # Height in meters
    'age': [25, 35, 45],
    'gender': ['M', 'F', 'M']
})

patients_bmi = add.bmi(patients)
patients_bsa = add.bsa(patients)
fitness_scores = add.fitness_score(patients)

# Chain multiple expressions
result = add.fitness_score(add.bmr(add.bmi(patients)))
```

## 🔧 DataFrame Support

Additory works seamlessly with multiple DataFrame libraries:

- **pandas** - Full support
- **polars** - Full support
- **cuDF** - GPU acceleration support

```python
import polars as pl
import additory as add

# Works with polars
df_polars = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
result = add.augment(df_polars, n_rows=100)

# Automatic type detection and conversion
```

## ✨ Key Features

### 🔧 Utilities

**add.to() - Data Lookup & Joins**
Simplified syntax for bringing columns from one dataframe to another.

```python
# Simple lookup
orders_with_prices = add.to(
    orders, 
    from_df=products, 
    bring='price', 
    against='product_id'
)

# Multiple columns and keys
enriched = add.to(
    orders,
    from_df=products,
    bring=['price', 'category'],
    against=['product_id', 'region']
)
```

**add.onehotencoding() - Categorical Encoding**
Convert categorical columns to one-hot encoded format.

```python
# One-hot encoding (single column)
encoded = add.onehotencoding(df, 'category')
```

**add.harmonize_units() - Unit Standardization**
Standardize units across your dataset.

```python
# Unit harmonization
standardized = add.harmonize_units(
    df, 
    value_column='temperature', 
    unit_column='unit',
    target_unit='C'
)
```

### 🧮 Expressions

Pre-built calculations for health, fitness, and common metrics. Simple examples:

```python
# Create patient data with correct column names
patients = pd.DataFrame({
    'weight_kg': [70, 80, 65],  # Weight in kilograms
    'height_m': [1.75, 1.80, 1.60],  # Height in meters
    'age': [25, 35, 45],
    'gender': ['M', 'F', 'M']
})

# Calculate BMI
patients_with_bmi = add.bmi(patients)

# Calculate Body Surface Area
patients_with_bsa = add.bsa(patients)

# Chain multiple expressions
result = add.fitness_score(add.bmr(add.bmi(patients)))
```

### 🔄 Augment Data Generation

**Augment** generates additional data similar to your existing dataset using inline strategies.

```python
# Augment existing data (learns from patterns)
more_customers = add.augment(customers, n_rows=1000)

# Create data from scratch with strategies
new_data = add.augment("@new", n_rows=500, strategy={
    'id': 'increment:start=1',
    'name': 'choice:[John,Jane,Bob]',
    'age': 'range:18-65'
})
```

## 🧪 Examples

### E-commerce Data Pipeline
```python
import pandas as pd
import additory as add

# Start with small customer sample
customers = pd.DataFrame({
    'customer_id': [1, 2, 3],
    'age': [25, 35, 45],
    'region': ['North', 'South', 'East']
})

# Generate more customers
customers = add.augment(customers, n_rows=10000)

# Add customer tiers
tiers = pd.DataFrame({
    'customer_id': range(1, 4),  # Match original IDs
    'tier': ['Gold', 'Silver', 'Bronze']
})

# Use pipeline approach
result = (customers
    .pipe(add.to, from_df=tiers, bring='tier', against='customer_id')
    .pipe(add.scan, preset="quick"))

print(result.summary())
```

### Healthcare Data Analysis
```python
# Create patient data from scratch
strategy = {
    'patient_id': 'increment:start=1',
    'age': 'range:18-80',
    'weight_kg': 'range:50-120',  # Weight in kg
    'height_cm': 'range:150-200'  # Height in cm
}

patients = add.augment("@new", n_rows=1000, strategy=strategy)

# Convert height to meters for expressions
patients['height_m'] = patients['height_cm'] / 100

# Calculate health metrics using pipeline
result = (patients
    .pipe(add.bmi)
    .pipe(add.scan, preset="correlations"))

print(result.correlations)
```

## 📚 Documentation

- **[Function Documentation](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/documentation/V0.1.0/)** - Detailed guides for each function
- **[Expressions Guide](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/documentation/V0.1.0/expressions.html)** - Complete expressions reference

## 📄 License

MIT License - see [LICENSE](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/LICENSE) file for details.

## 📞 Support

- **Issues**: [GitHub Issues](https://github.com/sekarkrishna/additory/issues)
- **Documentation**: [Full Documentation](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/documentation/V0.1.0)

## 🗺️ v0.1.1 (February 2025)
- Enhanced documentation and tutorials
- Performance optimizations
- Additional expressions
- Advanced synthetic data patterns

---

**Made with ❤️ for data scientists, analysts, and developers who love working with data.**
