Metadata-Version: 2.3
Name: open-parl
Version: 0.1.0
Summary: PARL (Parallel-Agent Reinforcement Learning) - A training paradigm for coordinating multiple agents in parallel workflows
License: Apache-2.0
Keywords: reinforcement-learning,multi-agent,parallel-computing,machine-learning,pytorch,artificial-intelligence,deep-learning,agent-coordination
Author: The Swarm Corporation
Author-email: support@swarms.com
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: numpy
Requires-Dist: torch
Project-URL: Documentation, https://github.com/The-Swarm-Corporation/PARL
Project-URL: Homepage, https://github.com/The-Swarm-Corporation/PARL
Project-URL: Repository, https://github.com/The-Swarm-Corporation/PARL
Description-Content-Type: text/markdown

# PARL: Parallel-Agent Reinforcement Learning

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-ee4c2c.svg)](https://pytorch.org/)

Official open-source implementation of **PARL (Parallel-Agent Reinforcement Learning)**, a novel training paradigm that enables AI models to decompose complex tasks into parallel subtasks and coordinate multiple agents simultaneously.

## Overview

PARL is a training methodology that addresses the critical challenge of **serial collapse** in multi-agent systems, where models default to sequential execution despite having parallel computational capacity. By implementing staged reward shaping and a latency-oriented evaluation metric, PARL trains models to efficiently orchestrate up to 100 sub-agents across 1,500+ coordinated steps.

### Key Features

- **Staged Reward Shaping**: Dynamic reward annealing that encourages parallelism early in training and gradually shifts focus toward task success
- **Instantiation Reward**: Incentivizes subagent creation and concurrent execution
- **Critical Steps Metric**: Latency-oriented evaluation inspired by parallel computation's critical path concept
- **Differentiable Components**: Fully compatible with gradient-based optimization
- **Orchestrator-Subagent Architecture**: Trainable coordinator with frozen execution agents

## Architecture

```
┌─────────────────────────────────────────────┐
│         Orchestrator Agent                  │
│  (Trainable Central Coordinator)            │
│  - Decomposes tasks into subtasks           │
│  - Manages parallel execution               │
│  - Coordinates subagent workflows           │
└──────────────┬──────────────────────────────┘
               │
               ├──────────┬──────────┬─────────┐
               │          │          │         │
          ┌────▼───┐ ┌───▼────┐ ┌──▼────┐  ┌─▼──────┐
          │Subagent│ │Subagent│ │Subagent│  │Subagent│
          │   1    │ │   2    │ │   3    │  │  ...N  │
          └────────┘ └────────┘ └────────┘  └────────┘
           (Frozen)   (Frozen)   (Frozen)    (Frozen)
```

## Reward Function

PARL implements a two-component reward structure:

```
R_t = λ_aux(e) · r_parallel + (1 - λ_aux(e)) · (𝟙[success] · Q(τ))
```

Where:
- `λ_aux(e)`: Anneals from 0.1 → 0.0 over training
- `r_parallel`: Instantiation reward encouraging parallelism
- `𝟙[success]`: Binary success indicator
- `Q(τ)`: End-to-end task quality metric

### Critical Steps Metric

Instead of counting total steps, PARL uses a latency-oriented metric:

```
CriticalSteps = Σ(S_main^(t) + max_i S_sub,i^(t))
```

This metric captures the true execution time considering parallel operations.

## Installation

### Using Poetry (Recommended)

```bash
# Clone the repository
git clone https://github.com/The-Swarm-Corporation/PARL.git
cd PARL

# Install dependencies with Poetry
poetry install

# Activate the virtual environment
poetry shell
```

### Using pip

```bash
# Clone the repository
git clone https://github.com/The-Swarm-Corporation/PARL.git
cd PARL

# Install dependencies
pip install -r requirements.txt
```

### From PyPI

```bash
pip install parl-rl
```

## Quick Start

```python
import torch
from parl import PARLReward, CriticalStepsMetric

# Initialize the reward function
reward_fn = PARLReward(
    lambda_init=0.1,
    lambda_final=0.0,
    total_training_steps=10000,
    device='cuda' if torch.cuda.is_available() else 'cpu'
)

# Prepare episode data
num_subagents = torch.tensor([25, 30, 40])  # Number of subagents per episode
trajectory_features = torch.randn(3, 64)     # Trajectory features
success = torch.tensor([1.0, 1.0, 0.0])      # Success indicators
training_step = 5000                          # Current training step

# Compute rewards
rewards = reward_fn.compute_full_reward(
    num_subagents=num_subagents,
    trajectory_features=trajectory_features,
    success=success,
    training_step=training_step,
    max_subagents=100
)

print(f"Total Reward: {rewards['total_reward']}")
print(f"Lambda (λ_aux): {rewards['lambda_aux']:.4f}")
print(f"Parallelism Component: {rewards['instantiation_component']}")
print(f"Task Success Component: {rewards['task_component']}")

# Evaluate using Critical Steps metric
critical_steps_metric = CriticalStepsMetric()

main_steps = torch.ones(3, 5) * 0.1  # Orchestration overhead
sub_steps = torch.rand(3, 5, 10)      # Subagent steps

critical_steps = critical_steps_metric(main_steps, sub_steps)
print(f"Critical Steps: {critical_steps}")
```

## API Reference

### `PARLReward`

Main reward function implementing staged reward shaping.

**Parameters:**
- `lambda_init` (float): Initial auxiliary reward weight (default: 0.1)
- `lambda_final` (float): Final auxiliary reward weight (default: 0.0)
- `total_training_steps` (int): Total training steps for annealing (default: 10000)
- `device` (str): Device for computation ('cpu' or 'cuda')

**Methods:**
- `compute_full_reward()`: Compute all reward components
- `compute_instantiation_reward()`: Calculate parallelism incentive
- `compute_task_quality()`: Calculate task success quality
- `anneal_lambda()`: Get current λ_aux value

### `CriticalStepsMetric`

Latency-oriented evaluation metric for parallel execution.

**Parameters:**
- `orchestration_overhead` (float): Overhead for orchestrator coordination (default: 0.1)

**Methods:**
- `forward()`: Compute critical steps for parallel workflows

## Experiments

Run the example training simulation:

```bash
python -m parl.main
```

This will demonstrate reward evolution across training stages and critical steps computation.

## Testing

Run the comprehensive test suite:

```bash
# Using pytest
pytest tests/ -v

# With coverage report
pytest tests/ --cov=parl --cov-report=html

# Run specific test file
pytest tests/test_parl.py -v
```

## Research Paper

This implementation is based on the technical report:

> **"PARL: Parallel-Agent Reinforcement Learning for Large Language Models"**
> Kimi AI Research Team, 2026

For technical details and experimental results, see: [Kimi K2.5 Technical Report](https://www.kimi.com/blog/kimi-k2-5.html)

## Citation

If you use PARL in your research, please cite:

```bibtex
@article{parl2026,
  title={PARL: Parallel-Agent Reinforcement Learning for Large Language Models},
  author={Kimi AI Research Team},
  journal={Technical Report},
  year={2026},
  url={https://www.kimi.com/blog/kimi-k2-5.html}
}
```

## Project Structure

```
PARL/
├── parl/
│   ├── __init__.py         # Package initialization
│   └── main.py             # Core PARL implementation
├── tests/
│   └── test_parl.py        # Comprehensive test suite
├── pyproject.toml          # Poetry configuration
├── README.md               # This file
├── LICENSE                 # Apache 2.0 License
└── .gitignore              # Git ignore rules
```

## Requirements

- Python >= 3.8
- PyTorch >= 2.0.0
- NumPy >= 1.24.0

## Contributing

We welcome contributions! Please follow these steps:

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

Please ensure your code passes all tests and follows PEP 8 style guidelines.


## License

This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- Inspired by the Kimi K2.5 technical report
- Built on PyTorch's efficient tensor operations
- Thanks to the open-source ML community

## Contact

- **Repository**: [github.com/The-Swarm-Corporation/PARL](https://github.com/The-Swarm-Corporation/PARL)
- **Issues**: [github.com/The-Swarm-Corporation/PARL/issues](https://github.com/The-Swarm-Corporation/PARL/issues)

---

**Made with ⚡ by The Swarm Corporation**

