Metadata-Version: 2.3
Name: satif-ai
Version: 0.2.12
Summary: AI Agents for Satif
License: MIT
Author: Syncpulse
Maintainer: Bryan Djafer
Maintainer-email: bryan.djafer@syncpulse.fr
Requires-Python: >=3.10,<3.14
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Provides-Extra: xlsx
Requires-Dist: openai-agents (>=0.0.9,<0.0.10)
Requires-Dist: satif-sdk (>=0.1.0,<1.0.0)
Requires-Dist: sdif-mcp (>=0.1.0,<1.0.0)
Description-Content-Type: text/markdown

# SATIF AI

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python Version](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)
[![Status: Experimental](https://img.shields.io/badge/Status-Experimental-orange.svg)](https://github.com/syncpulse-solutions/satif)

AI toolkit for transforming any input files into any output files.

## ⚠️ Disclaimer

**EXPERIMENTAL STATUS**: This package is in early development and not production-ready. The API may change significantly between versions.

**BLOCKING I/O**: Despite the async API, some operations may contain blocking I/O. This package should be used for testing and experimental purposes only.

## Installation

```bash
pip install satif-ai
```

## Overview

SATIF AI enables automated transformation of heterogeneous data sources (CSV, Excel, PDF, XML, etc.) into any desired output format in 2 steps:

1. **Standardization**: Ingests heterogeneous source files (CSV, Excel, PDF, XML, etc.) and transforms them into SDIF, a structured intermediate format.
2. **Transformation**: Applies business logic to the standardized data to generate the target output files, with transformation code generated by AI.

## Key Features

- **Any Format Support**: Process virtually any input, even challenging unstructured content (PDFs, complex Excel sheets)
- **AI-Powered Code Generation**: Automatically generate transformation code from examples and natural language instructions
- **Robust Schema Enforcement**: Handle input data drift and schema inconsistencies through configurable validation
- **SQL-Based Data Processing**: Query and manipulate all data using SQL
- **Decoupled Processing Stages**: Standardize once, transform many times with different logic

## Usage

### Basic Workflow

```python
import asyncio
from satif_ai import astandardize, atransform

async def main():
    # Step 1: Standardize input files into SDIF
    sdif_path = await astandardize(
        datasource=["data.csv", "reference.xlsx"],
        output_path="standardized.sdif",
        overwrite=True
    )

    # Step 2: Transform SDIF into desired output using AI
    await atransform(
        sdif=sdif_path,
        output_target_files="output.json",
        instructions="Extract customer IDs and purchase totals, calculate the average purchase value per customer, and output as JSON with customer_id and avg_purchase_value fields.",
        llm_model="o4-mini"  # Choose AI model based on needs
    )

if __name__ == "__main__":
    asyncio.run(main())
```

## Architecture

```
┌─────────────────┐     ┌───────────────────────┐     ┌─────────────────┐
│  Source Files   │────▶│ Standardization Layer │────▶│   SDIF File     │
│ CSV/Excel/PDF/  │     │                       │     │ (SQLite-based)  │
│ XML/JSON/etc.   │     └───────────────────────┘     └────────┬────────┘
└─────────────────┘                                            │
                                                               │
┌─────────────────┐     ┌───────────────────────┐              │
│  Output Files   │◀────│  Transformation Layer │◀─────────────┘
│ Any format      │     │  (AI-generated code)  │
└─────────────────┘     └───────────────────────┘
```

SDIF (Standardized Data Interoperable Format) is the intermediate SQLite-based format that:

- Stores structured tables alongside JSON objects and binary media
- Maintains rich metadata about data origins and relationships
- Provides direct SQL queryability for complex transformations

## Documentation

For detailed documentation, examples, and advanced features, visit [SATIF Documentation](https://satif.io/docs).

## Contributing

Contributions are welcome! Whether it's bug reports, feature requests, or code contributions, please feel free to get involved.

### Contribution Workflow

1. **Fork the repository** on GitHub.
2. **Clone your fork** locally:

   ```bash
   git clone https://github.com/syncpulse-solutions/satif.git
   cd satif/libs/ai
   ```
3. **Create a new branch** for your feature or bug fix:

   ```bash
   git checkout -b feature/your-feature-name
   ```

   or

   ```bash
   git checkout -b fix/your-bug-fix-name
   ```
4. **Set up the development environment** as described in the [From Source (for Development)](#from-source-for-development) section:

   ```bash
   make install  # or poetry install
   ```
5. **Make your changes.** Ensure your code follows the project's style guidelines.
6. **Format and lint your code:**

   ```bash
   make format
   make lint
   ```
7. **Run type checks:**

   ```bash
   make typecheck
   ```
8. **Run tests** to ensure your changes don't break existing functionality:

   ```bash
   make test
   ```

   To also generate a coverage report:

   ```bash
   make coverage
   ```
9. **Commit your changes** with a clear and descriptive commit message.
10. **Push your changes** to your fork on GitHub:

    ```bash
    git push origin feature/your-feature-name
    ```
11. **Submit a Pull Request (PR)** to the `main` branch of the original `syncpulse-solutions/satif` repository.

## License

This project is licensed under the MIT License.

Maintainer: Bryan Djafer (bryan.djafer@syncpulse.fr)

