Metadata-Version: 2.4
Name: dsf-aml-sdk
Version: 2.2.0
Summary: DSF AML SDK — Automated ML Robustness & Failure Correction Framework
Home-page: https://dsfuptech.cloud
Author: Jaime Alexander Jimenez
Author-email: contacto@dsfuptech.cloud
License: Proprietary
Keywords: dsf aml ml machine-learning robustness auto-healing sdk
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: requests>=2.25.0
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# DSF AML SDK

**Automated ML Robustness & Training Data Generation**

Generate critical training variants from production failures and edge cases. Accelerate model retraining with automatically curated datasets.

---

## 🎯 Primary Use Cases

### 1. Production Failure Recovery

**Challenge:** ML/LLM models fail on edge cases. Manual correction is slow.

**Solution:** Generate critical variants from failures for rapid retraining.

```python
from dsf_aml_sdk import AMLSDK

sdk = AMLSDK(license_key='your_key', tier='professional')

# Production failure detected
failed_case = {'metric_a': 0.60, 'metric_b': 500, 'metric_c': 0.20}

# Generate variants
variants = sdk.generate_variants(
    seed=failed_case,
    config=your_config,
    count=20
)

# Use variants['samples'] for retraining
```

**Output:** Labeled data points similar to failure case for model robustness improvement.

---

### 2. Preventive Dataset Curation

**Challenge:** Models trained on clean data fail on boundary cases.

**Solution:** Pre-generate datasets focused on decision boundaries.

```python
# Identify high-impact regions
seeds = sdk.identify_high_impact_regions(
    dataset=training_data,
    config=config,
    focus_percent=0.1
)

# Generate boundary variants
boundary_data = sdk.generate_boundary_variants(
    config=config,
    source_data=training_data,
    variants_per_seed=10
)

# Train with boundary_data
```

---

### 3. Training Data Generation

**Challenge:** Creating labeled datasets is expensive.

**Solution:** Generate synthetic labeled datasets at scale.

```python
# Generate labeled samples
result = sdk.generate_training_data(config, samples=1000)

# Export (Enterprise)
dataset = sdk.export_dataset()

# Train your models with generated data
```

---

## 📦 Installation

```bash
pip install dsf-aml-sdk
```

---

## 🧩 Quick Start

```python
from dsf_aml_sdk import AMLSDK

sdk = AMLSDK(license_key='your_key', tier='professional')

# Define evaluation config
config = {
    'metric_a': {
        'reference_value': 0.95,
        'params': {
            'importance': 2.5,
            'sensitivity': 2.0
        }
    },
    'metric_b': {
        'reference_value': 100,
        'params': {
            'importance': 1.8,
            'sensitivity': 1.5
        }
    }
}

# Report failure
failed_input = {'metric_a': 0.60, 'metric_b': 500}

# Generate corrections
fix = sdk.generate_variants(failed_input, config, count=20)

print(f"Generated {len(fix['samples'])} variants")
```

---

## 📊 Execution Metrics

Operations return performance metrics:

```python
{
  "tier": "professional",
  "evaluations": 62,
  "threshold": 0.6698,
  "persistence": "active",
  "statistics": {
    "avg": 0.7296,
    "min": 0.5217,
    "max": 0.8467
  }
}
```

---

## 🆚 Tier Comparison

| Feature             | Community |    Professional  | Enterprise |
|---------------------|-----------|------------------|------------|
| Variant Generation  |  Limited  |         ✅       |    ✅     |
| Preventive Datasets |  Limited  |         ✅       |    ✅     |
| Batch Operations    |    ❌     |     ✅ (≤1000)  | ✅ (≤1000) |
| Data Export         |    ❌     |         ✅       |    ✅     |
| Full Pipeline       |    ❌     |         ❌       |    ✅     |

---

## 📖 Core Methods

### Variant Generation

```python
sdk.generate_variants(seed: dict, config, count=20) → dict
```

Returns:
```python
{
  "status": "completed",
  "total": 20,
  "samples": [...],
  "metrics": {...}
}
```

### High-Impact Region Identification

```python
sdk.identify_high_impact_regions(dataset, config, focus_percent=0.1) → dict
sdk.generate_boundary_variants(config, source_data, **kwargs) → dict
```

### Training Data Generation

```python
sdk.generate_training_data(config, samples=1000) → dict
sdk.export_dataset() → dict  # Enterprise only
```

### Evaluation

```python
# Single evaluation
result = sdk.evaluate(data, config)

# Batch evaluation
results = sdk.batch_evaluate(data_points, config)
```

---

## 🔧 Configuration Structure

```python
config = {
    "feature_name": {
        "reference_value": <target_value>,
        "params": {
            "importance": <float>,    # Feature weight
            "sensitivity": <float>    # Deviation tolerance
        }
    }
}
```

### Example Configuration

```python
config = {
    'metric_primary': {
        'reference_value': 650,
        'params': {
            'importance': 2.5,
            'sensitivity': 2.0
        }
    },
    'metric_secondary': {
        'reference_value': 60000,
        'params': {
            'importance': 2.0,
            'sensitivity': 1.8
        }
    }
}
```

---

## 🛠️ Complete Workflow

```python
import pandas as pd
from dsf_aml_sdk import AMLSDK

# Initialize
sdk = AMLSDK(license_key='your_key', tier='professional')

# Load data
df = pd.read_csv('data.csv')
data = df[['metric_a', 'metric_b', 'metric_c']].head(100).to_dict('records')

# Config
config = {
    'metric_a': {
        'reference_value': 650,
        'params': {'importance': 2.5, 'sensitivity': 2.0}
    },
    'metric_b': {
        'reference_value': 60000,
        'params': {'importance': 2.0, 'sensitivity': 1.8}
    }
}

# 1. Fix failure
failed = {'metric_a': 580, 'metric_b': 35000}
fix = sdk.generate_variants(seed=failed, config=config, count=10)

# 2. Preventive dataset
seeds = sdk.identify_high_impact_regions(data[:50], config, focus_percent=0.15)
boundary = sdk.generate_boundary_variants(config, data[:50], variants_per_seed=20)

# 3. Generate training data
result = sdk.generate_training_data(config, samples=200)

# 4. Evaluate
test = data[0]
result = sdk.evaluate(test, config)
```

---

## ⚠️ Important Notes

**Client Responsibility:**  
Clients must validate model performance and compliance with applicable regulations. This SDK is a data generation tool and does not make autonomous decisions.

**Data Processing:**  
All generation logic executes server-side. SDK exposes configuration interface only.

**Generated Data:**  
Synthetic data is based on client-provided configurations and source datasets. Clients control all inputs and validation.

---

## 📞 Support

**Licensing:** contacto@dsfuptech.cloud  
**Technical Docs:** Available under NDA  
**Enterprise:** contacto@dsfuptech.cloud

---

## 📋 Credits

**Technology Architect:** Jaime Alexander Jimenez

---

© 2025 DSF UpTech. All rights reserved.
