Metadata-Version: 2.4
Name: IL-Datasets
Version: 1.0.0
Summary: A package for creating datasets for IL and IRL, training agents and benchmarking
Author-email: Nathan Gavenski <nathangavenski@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/NathanGavenski/IL-Datasets
Project-URL: Bug Tracker, https://github.com/NathanGavenski/IL-Datasets/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9.0
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: huggingface_sb3
Requires-Dist: stable_baselines3
Requires-Dist: sb3_contrib
Requires-Dist: tqdm
Requires-Dist: psutil
Requires-Dist: shimmy>=0.2.1
Requires-Dist: datasets==3.4.1
Requires-Dist: pandas
Requires-Dist: torchvision
Provides-Extra: benchmark
Requires-Dist: tensorboard-wrapper; extra == "benchmark"
Requires-Dist: tensorboard; extra == "benchmark"
Requires-Dist: tabulate; extra == "benchmark"
Requires-Dist: pyyaml; extra == "benchmark"
Requires-Dist: gym; extra == "benchmark"
Requires-Dist: gymnasium; extra == "benchmark"
Requires-Dist: gymnasium-robotics; extra == "benchmark"
Requires-Dist: mujoco-py; extra == "benchmark"
Requires-Dist: gymnasium[classic-control]; extra == "benchmark"
Dynamic: license-file

# IL Datasets

![cover](./assets/IL-Datasets.png)

Hi, welcome to the Imitation Learning (IL) Datasets.
Something that always bothered me a lot was how difficult it was to find good weights for an expert, trying to create a dataset for different state-of-the-art methods, and also having to run all methods due to no common datasets.
For these reasons, I've created this repository in an effort to make it more accessible for researchers to create datasets using experts from the [Hugging Face](https://huggingface.co/models?pipeline_tag=reinforcement-learning).
IL-Datasets provides teacher weights for different environments, a multi-threading solution for creating datasets faster, datasets for a set of environments (check the bottom of this document to see which environments are already released), and a benchmark for common imitation learning methods.
We hope that by releasing these features, we can make the barrier to learning and researching imitation learning more accessible.

**This project is under development. If you are interested in helping, feel free to contact [me](https://nathangavenski.github.io/).**

## Requirements

The project supports Python versions `3.8`~`3.11`.
All requirements for the `imitation_datasets` package are listed in [requirements.txt](https://github.com/NathanGavenski/IL-Datasets/blob/main/requirements/requirements.txt). These requirements are required by the package and are installed together with the `IL-Datasets`.
For requirements to use the `benchmark` package, use both the `imitation_datasets` requirements and the ones listed in [benchmark.txt](https://github.com/NathanGavenski/IL-Datasets/blob/main/requirements/benchmark.txt).
Development requirements are listed at [dev.txt](https://github.com/NathanGavenski/IL-Datasets/blob/main/requirements/dev.txt). We do not recommend using these dependencies outside development. They use an outdated version from gym `v0.21.0` to test the `GymWrapper` class.

## Install

IL-Datasets doesn't download its **PyTorch** and **Gym** dependencies so it doesn't force users to use any specific versions.
We test IL-Datasets using `pytorch@latest`, `gymnasium@latest` and `gym@v0.21.0`.
If there is any issue with a different version, please open an issue so we might take a look.

The package is available on PyPi:
```bash
# Stable version
pip install il-datasets
```

But if you prefer, you can install it from the source.
```bash
git clone https://github.com/NathanGavenski/IL-Datasets.git
cd IL-Datasets
pip install -e .
```

## Docker image

If you want to run IL-Datasets with a docker to test, this project has a `Dockerfile`.
Currently, the files is configures for the AAMAS demonstration, which means that it instantiates the notebooks created to exemplify each part of the package (data creation, training assistance and benchmarking).

To build and run the docker image:
```bash
docker build -t ildatasets:latest .
docker run -p 127.0.0.1:8888:8888 ildatasets:latest
```

## How does it work?

This project also works with multithreading, which should accelerate the dataset creation.
It consists of one ``Controller`` class, which requires two different functions to work: (i) a ``enjoy`` function (for the agent to play and record an episode); and a (ii) ``collate`` function (for putting all episodes together).

---

The ``enjoy`` function will receive 3 parameters and return 1:
```python
"""
Args:
   path (str): where the episode is going to be recorded
   experiment (Context): A class for recording all information (if you don't want to use `print` - keeping the console clear)
   expert (Policy): A model based on the [StableBaselines3](https://stable-baselines3.readthedocs.io/en/master/) `BaseAlgorithm`.

Returns:
   status (bool): Whether it was successful or not
"""
```

Obs: To use the model you can call ``predict``, the policy class already has the correct form of using it (a.k.a., how the StableBaselines3 uses).

---

The ``collate`` function will receive 2 parameters and return 1:

```python
"""
Args:
   path (str): Where it should save the final dataset
   episodes  (list[str]): A list of paths for each file

Returns:
   status (bool): Whether it was successful or not
"""
```

## Default functions

The `imitation_datasets` package also comes with a set of default functions, so you don't need to implement a `enjoy` and a `collate` function in every project.
The resulting dataset will be a `NpzFile` with the following data:
```python
"""
Data:
   obs (list[list[float]): gym environment observation. Size [steps, observations space].
   actions (list[float]): agent action. Size [steps, action] (1 if single action, n if multiple actions).
   rewards (list[int]): reward from the action with the observations (e.g., r(obs, action)). Size [steps, ].
   episode_returns (list[float]): accumulated reward for each episode. Size [number of peisodes, ].
   episode_starts (list[bool]): whether the episode started at the current observation. Size [steps, ].
"""
```

A small functional example of how to use the given functions:
```python
# python <script> --game cartpole --threads 4 --episodes 1000 --mode all
from imitation_datasets.functions import baseline_enjoy, baseline_collate
from imitation_datasets.controller import Controller
from imitation_datasets.args import get_args

args = get_args()
controller = Controller(baseline_enjoy, baseline_collate, args.episodes, args.threads)
controller.start(args)
```

This script will use the pre-registered `CartPole-v1` environment with the HuggingFace weights and create a `teacher.npz` dataset file in `./dataset/cartpole/teacher.npz`.

## Registered environments

IL-Datasets comes with some already registered weights from HuggingFace.
To check which environments are already registered, check under the `src.imitation_datasets.registers` folder.

## Registering new experts

If you would like to add new experts locally, you can call the [Experts](./utils/experts.py) class. It uses the following structure:

```python
"""
Args:
   identifier (str): Name for calling the expert (e.g., cartpole).
   Policy (Policy): a dataclass with:
      name (str): Gym environment name
      repo_id (str): HuggingFace repo identification
      filename (str): HuggingFace weights file name
      threshold (float): How much reward should the episode accumulate to be considered good
      algo (BaseAlgorithm): The class from StableBaselines3
"""
```

If not using StableBaselines, you can load a `Policy` and not call the `load()` function (which downloads weights from HuggingFace).
Moreover, the expert has to have a `predict` function that receives:

```python
"""
Args:
   obs (Tensor): current environment state
   state (Tensor): Model's internal state
   deterministic (bool): if it should explore or not.
"""
```

## Datasets

The IL-Datasets also come with a default PyTorch dataset, called `BaselineDataset`. It uses the pattern set by the `baseline_collate` function, and it allows the use of HuggingFace datasets created by the `baseline_to_huggingface` function.
The dataset list for benchmarking is under development, so to check all new versions, you can visit our collection on [HuggingFace](https://huggingface.co/collections/NathanGavenski/imitation-learning-datasets-6542982072defaf65937432d).

To use the Baseline dataset, you can use a local file:
```python
from src.imitation_datasets.dataset import BaselineDataset
BaselineDataset(f"./dataset/cartpole/teacher.npz")
```

Or a HuggingFace path:
```python
from src.imitation_datasets.dataset import BaselineDataset
BaselineDataset(f"NathanGavenski/CartPole-v1", source="huggingface")
```

Finally, the dataset allows for fewer episodes and splitting for `evaluation` and `train`.
```python
from src.imitation_datasets.dataset import BaselineDataset
dataset_train = BaselineDataset(f"NathanGavenski/CartPole-v1", source="huggingface", n_episodes=100)
dataset_eval = BaselineDataset(f"NathanGavenski/CartPole-v1", source="huggingface", n_episodes=100, split="eval")
```

## Benchmark

Last but not least, IL-Datasets comes with its own benchmarking.
It uses IL methods on already published datasets to provide consistent results for researchers who also use our datasets.
Currently, we support:
   
| Algorithm | Implementation | Benchmark
| --- | --- | :---: |
| Behavioural Cloning | [`benchmark.methods.bc`](./src/benchmark/methods/bc.py) | ✅ |
| Behavioural Cloning from Observation | [`benchmark.methods.bco`](./src/benchmark/methods/bco.py) | ✅ |
| Augmented Behavioural Cloning from Observation | [`benchmark.methods.abco`](./src/benchmark/methods/abco.py) | ✅ |
| Imitating Unknown Policies via Exploration | [`benchmark.methods.iupe`](./src/benchmark/methods/iupe.py) | ✅ |

However, our plan is to implement more state-of-the-art methods.

You can check the current benchmark results at [benchmark_results.md](https://github.com/NathanGavenski/IL-Datasets/blob/main/benchmark_results.md).

---
## This repository is under development

Here is a list of the upcoming releases:

- [ ] Benchmark methods
   - [x] Behavioural Cloning
   - [x] Behavioural Cloning from Observation
   - [ ] Imitating Latent Policies from Observation
   - [x] Augmented Behavioural Cloning from Observation
   - [X] Imitating Unkown Policies via Exploration
   - [ ] Generative Adversarial Imitation Learning
   - [ ] Generative Adversarial Imitation Learning from Observation
   - [ ] Off-Policy Imitation Learning from Observations
   - [ ] Model-Based Imitation Learning From Observation Alone
   - [ ] Self-Supervised Adversarial Imitation Learning
- [ ] Benchmark environments
   - [x] CartPole-v1
   - [x] MountainCar-v0
   - [x] Acrobot-v1
   - [X] LunarLander-v2
   - [ ] Ant-v4
   - [X] Hopper-v4
   - [X] HalfCheetah-v4
   - [ ] Walker-v4
   - [ ] Humanoid-v4
   - [ ] Swimmer-v4

 Although there are a lot of environments and methods to go through, this repository will be considered done once the documentation and the installation of the benchmarks are done. We don't have a plan for releases for environments and methods yet.

## Cite

If you used IL-Datasets on your research and would like to cite us:
```{bibtex}
@inproceedings{gavenski2024ildatasets,
   author = {Gavenski, Nathan and Luck, Michael and Rodrigues, Odinaldo},
   title = {Imitation Learning Datasets: A Toolkit For Creating Datasets, Training Agents and Benchmarking},
   year = {2024},
   isbn = {9798400704864},
   publisher = {International Foundation for Autonomous Agents and Multiagent Systems},
   address = {Richland, SC},
   abstract = {Imitation learning field requires expert data to train agents in a task. Most often, this learning approach suffers from the absence of available data, which results in techniques being tested on its dataset. Creating datasets is a cumbersome process requiring researchers to train expert agents from scratch, record their interactions and test each benchmark method with newly created data. Moreover, creating new datasets for each new technique results in a lack of consistency in the evaluation process since each dataset can drastically vary in state and action distribution. In response, this work aims to address these issues by creating Imitation Learning Datasets, a toolkit that allows for: (i) curated expert policies with multithreaded support for faster dataset creation; (ii) readily available datasets and techniques with precise measurements; and (iii) sharing implementations of common imitation learning techniques. Demonstration link: https://nathangavenski.github.io/#/il-datasets-video},
   booktitle = {Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems},
   pages = {2800–2802},
   numpages = {3},
   keywords = {benchmarking, dataset, imitation learning},
   location = {<conf-loc>, <city>Auckland</city>, <country>New Zealand</country>, </conf-loc>},
   series = {AAMAS '24}
}
```

## If you like this repository, be sure to check my other projects:

### Development-based
- [An easy to use Wrapper for Tensorboard](https://github.com/NathanGavenski/Tensorboard-Wrapper)
- [A watcher for python to facilitate development of small projects](https://github.com/NathanGavenski/python-watcher)

### Academic
- [Explorative Imitation Learning: A Path Signature Approach for Continuous Environments (ECAI)](https://arxiv.org/abs/2407.04856)
- [Self-Supervised Adversarial Imitation Learning (IJCNN)](https://arxiv.org/pdf/2304.10914.pdf)
- [How Resilient are Imitation Learning Methods to Sub-Optimal Experts? (BRACIS)](https://link.springer.com/chapter/10.1007/978-3-031-21689-3_32)
- [Self-supervised imitation learning from observation (MSc dissertation)](https://repositorio.pucrs.br/dspace/bitstream/10923/17536/1/000500266-Texto%2Bcompleto-0.pdf)
- [Imitating Unknown Policies via Exploration (BMVC)](https://arxiv.org/pdf/2008.05660.pdf)
- [Augmented behavioral cloning from observation (IJCNN)](https://arxiv.org/pdf/2004.13529.pdf)

