Metadata-Version: 2.4
Name: distilabel-steps-library
Version: 0.1.1
Summary: A bunch of steps for distilabel.
License: WTFPL
Requires-Python: >=3.12
Requires-Dist: distilabel>=1.4.1
Description-Content-Type: text/markdown

# Distilabel Steps Library

> **Note**: This README was automatically generated by Claude-3.5-Sonnet. If you spot any errors or confusing sections, please file an issue or submit a PR!

A collection of utility steps for processing and manipulating data in distilabel pipelines.

## Installation

```bash
pip install distilabel-steps-library
```

## Available Steps

### Chat Processing Steps

#### FormatPlaintextChatTranscript

Formats chat messages into a plaintext transcript format where each message is represented as "<role>: <content>" on a new line.

**Input Columns:**

- `messages` (List[Dict[str, str]]): List of message dictionaries with 'role' and 'content' keys

**Output Columns:**

- `transcript` (str): Plaintext representation of the chat messages

**Example:**

```python
from distilabel_steps_library.chat import FormatPlaintextChatTranscript

format_transcript = FormatPlaintextChatTranscript()
result = next(
    format_transcript.process([{
        "messages": [
            {"role": "user", "content": "What's 2+2?"},
            {"role": "assistant", "content": "4"}
        ]
    }])
)
# Result includes: 'transcript': 'user: What's 2+2?\nassistant: 4'
```

#### FlipMessageRoles

Flips the roles in chat messages between 'user' and 'assistant' while preserving system messages.

**Input Columns:**

- `messages` (List[Dict[str, str]]): List of message dictionaries

**Output Columns:**

- `flipped_messages` (List[Dict[str, str]]): Messages with swapped roles

**Example:**

```python
from distilabel_steps_library.chat import FlipMessageRoles

flip_roles = FlipMessageRoles()
result = next(
    flip_roles.process([{
        "messages": [
            {"role": "user", "content": "Hello"},
            {"role": "assistant", "content": "Hi"}
        ]
    }])
)
# Result includes flipped roles: user->assistant, assistant->user
```

#### InsertMessage

Inserts a new message into the chat messages at a specified index.

**Input Columns:**

- `messages` (List[Dict[str, str]]): List of message dictionaries
- `content` (str): Content for the message to be inserted

**Output Columns:**

- `messages` (List[Dict[str, str]]): Modified list with the new message

**Example:**

```python
from distilabel_steps_library.chat import InsertMessage

insert = InsertMessage(index=0, role="system")
result = next(
    insert.process([{
        "messages": [
            {"role": "user", "content": "Hi"}
        ],
        "content": "Be helpful"
    }])
)
# Inserts system message at the beginning
```

### Data Cleaning Steps

#### DropEmpty

Filters out rows containing empty values in specified columns.

**Input Columns:**

- Any columns specified in the `columns` parameter (or all columns if none specified)

**Output Columns:**

- All input columns (for non-empty rows)

**Example:**

```python
from distilabel_steps_library import DropEmpty

# Drop rows with empty values in specific columns
drop_step = DropEmpty(columns=["instruction", "response"])
result = next(
    drop_step.process([
        {"instruction": "Task", "response": ""},  # Will be dropped
        {"instruction": "Task 2", "response": "Answer"}  # Will be kept
    ])
)
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

WTFPL.
