# Dross - ML Pipeline Framework

![Python](https://img.shields.io/badge/python-3.12+-blue)
![License](https://img.shields.io/badge/license-MIT-green)

**Dross**: A reusable ML pipeline framework for Kaggle and data science projects built on medallion architecture, MLflow tracking, and Unity Catalog integration.

## Features

- **Medallion Architecture**: Organize data into Bronze (raw) → Silver (cleaned) → Gold (prepared) layers
- **MLflow Integration**: Track experiments, log models, and manage runs effortlessly
- **Unity Catalog**: Centralized data governance with UC integration
- **Flexible Models**: Abstract base class for custom model implementations
- **Feature Extraction**: Built-in TF-IDF vectorization utilities
- **Minimal CLI**: Configuration validation and schema documentation

## Installation

```bash
pip install dross
```

## Quick Start

```python
from dross.data import MedallionPipeline
from dross.tracking import ExperimentTracker, UCClient

# Setup
uc = UCClient(server="http://localhost:8080")
pipeline = MedallionPipeline(cfg.unity_catalog, uc, storage_base)

# Ingest raw data to Bronze
await pipeline.ingest(raw_csv, columns=schema)

# Clean and transform to Silver
await pipeline.clean(source, target, transform_func)

# Prepare for training in Gold
await pipeline.prepare(source, target)

# Track your experiment
tracker = ExperimentTracker(experiment_name="my-experiment")
tracker.start_run(run_name="run-1")
tracker.log_metrics({"accuracy": 0.95})
tracker.end_run()
```

## Documentation

- [README](README.md) - Complete overview and API reference
- [AGENTS.md](AGENTS.md) - Development workflow and Make targets
- [CONTRIBUTING.md](CONTRIBUTING.md) - Contribution guidelines

## Development

```bash
make sync       # Setup dependencies
make fmt        # Format code
make lint       # Lint code
make typecheck  # Type check
make test       # Run tests
make qa         # Run all checks
```

## License

MIT License - See LICENSE file

## Contributing

Contributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
