Metadata-Version: 2.4
Name: isage-finetune
Version: 0.1.0.5
Summary: SAGE Fine-tuning Framework - Trainers and data loaders for LLM fine-tuning
Author-email: IntelliStream Team <shuhao_zhang@hust.edu.cn>
License-Expression: MIT
Project-URL: Homepage, https://github.com/intellistream/sage-finetune
Project-URL: Repository, https://github.com/intellistream/sage-finetune
Keywords: finetune,training,LoRA,LLM,AI
Requires-Python: ==3.11.*
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2.0.0
Requires-Dist: typing-extensions>=4.0.0
Requires-Dist: isage-libs>=0.2.4.22
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: ruff>=0.4.0; extra == "dev"
Dynamic: license-file

# sage-finetune

Fine-tuning implementations for the SAGE AI data processing framework.

## Installation

```bash
pip install isage-finetune
```

For LoRA training:
```bash
pip install isage-finetune[peft]
```

## Features

- **LoRA Trainer**: Parameter-efficient fine-tuning with Low-Rank Adaptation
- **Mock Trainer**: Testing trainer for pipeline validation
- **JSON/JSONL Loader**: Flexible data loading for instruction and chat formats

## Quick Start

```python
from sage_finetune import MockTrainer, JSONDatasetLoader

# Load training data
loader = JSONDatasetLoader()
train_data = loader.load("train.jsonl")

# Train (mock for testing)
trainer = MockTrainer()
result = trainer.train(train_data)
print(f"Loss: {result['train_loss']:.4f}")
```

### LoRA Fine-tuning

```python
from sage_finetune import LoRATrainer, LoRAConfig

trainer = LoRATrainer(
    model_name="gpt2",
    lora_config=LoRAConfig(r=8, lora_alpha=16),
)

result = trainer.train(train_dataset)
trainer.save_model("./my_lora_model")
```

## Data Formats

### Instruction Format
```json
{"instruction": "Summarize this text", "input": "Long text...", "output": "Summary..."}
```

### Chat Format
```json
{"messages": [{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hi"}]}
```

## Integration with SAGE

When SAGE is installed, components auto-register with the framework:

```python
from sage.libs.finetune import create_trainer

trainer = create_trainer("lora", model_name="gpt2")
```

## License

Apache 2.0
