Metadata-Version: 2.4
Name: gdm_robotics
Version: 1.0.2
Summary: Google DeepMind Robotics Interfaces and utilities.
Keywords: RL,Reinforcement Learning,AI,ML
Author-email: Google DeepMind <robotics+oss@google.com>
Maintainer-email: Claudio Fantacci <cfantacci@google.com>, Francesco Romano <fraromano@google.com>
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-Expression: Apache-2.0
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
License-File: LICENSE
Requires-Dist: numpy >= 2.0, < 3.0
Requires-Dist: absl-py
Requires-Dist: dm-env
Requires-Dist: typing-extensions
Requires-Dist: pytest
Requires-Dist: gymnasium >= 1.0.0
Project-URL: Issues, https://github.com/google-deepmind/gdm_robotics/issues
Project-URL: Repository, https://github.com/google-deepmind/gdm_robotics.git

# `gdm_robotics`: The Google DeepMind Robotics interfaces

This package describes a set of interfaces for Python reinforcement learning
(RL) environments. It consists of the following core components:

*   `gdm_robotics.interfaces.Environment`: An abstract base class for RL environments.
*   `gdm_robotics.interfaces.Policy`: An abstract base class for Agent policies.
*   `gdm_robotics.interfaces.EpisodicLogger`: An abstract base class for loggers for Agent/Environment interaction.
*   `gdm_robotics.runtime.RunLoop`: A concrete RunLoop class to run a policy against an environment and logging their interaction.

## The core classes

### The Environment interface.
The `Environment` interface is very similar to (and indeed, it inherits from)
the [DeepMind Environment interface](https://github.com/google-deepmind/dm_env).
It provides an abstraction for any "controllable" system, as seen from the
perspective of an RL agent.

An `Environment` mainly exposes two methods:

- `reset` which resets the environment to a known state returning the "timestep"
    representing it, and
- `step` which applies a given action and returns a new "timestep".

A `Timestep` is a tuple grouping:

- `observation`: the actual observations provided by the environment
- `reward`: the reward associated with this specific step.
- `discount`: the discount associated with this specific step.
- `step_type`: a value specifying if this is the `Timestep` returned by the
 `reset` (`step_type == FIRST`), the last timestep of the episode
  (`step_type == LAST`. Users need to check the discount to understand if this
   is a termination (zero-like `discount`) or a truncation (`discount` different
    from zero)).

Additionally, an `Environment` returns specifications (in the shape of
`dm_env.specs.Array` object) representing the accepted actions (`action_spec`)
and timestep (`timestep_spec`).

This `Environment` abstract class provides typing support and is more strict
than the `dm_env` equivalent.

### The Policy interface.
The `Policy` interface provides an abstraction for Agent policies. The `Policy`
interface assumes a stateless `Policy`, in the sense that the state should be
explicitly provided to the policy when calling its methods.

Note that it is nevertheless possible to have implicit state and make the class
stateful.

A `Policy` should implement:

- `initial_state`: returning the initial state of the `Policy`.
- `step`: given a `Timestep` and a `Policy` state, generate the next action (and
 return the next `Policy` state).

Similarly to the `Environment` a `Policy` also provides specifications by
implementation of the `step_spec` method.

### The Logger interface

The `EpisodicLogger` class describes the interface for a logger responsible for
logging the interaction between a `Policy` and an `Environment` during a single
episode. As such it exposes:

- `reset`: This logs a "reset" `Timestep`, i.e. the first timestep of the
 episode.
- `record_action_and_next_timestep`: This records a `Policy`'s action and the
 timestep that has been generated by applying this action to the `Environment`.
- `write`: Marks an episode as terminated triggering (depending on the
implementations) a flush.

### The Runloop class

The `Runloop` is a concrete class that is responsible for running possibly
multiple episodes of a `Policy` interacting with an `Environment`.

The `Runloop` requires at least a single `Environment`, a single `Policy` and a
collection of `EpisodicLoggers`.

When calling `run` (or `run_single_episode`) the `Runloop` will take care of
correctly stepping `Policy` and `Environment` and logging the generated data.

#### Runloop customisation

The `Runloop` can be customised by passing different options.

1. Signal handlers. The `Runloop` by default will not intercept any `SIGINT` and
it is responsibility of the caller to handle those. In some cases it might be
beneficial to let the `Runloop` handle that. In that case, pass
`handle_sigint=True` to the `Runloop` initializer.
2. Provide reset options to the `Environment`. In case your `Environment`
accepts reset options (for example because it wraps a Gymnasium environment) you
might want to provide options at reset time. In this case you can specify a
callable to the `init` `reset_options_provider` argument which will be called
before every episode reset.
3. More complex customisation can be done by using the
`RunloopRuntimeOperations` class. The `Runloop` initializer accepts a collection
of `RunloopRuntimeOperations` objects.


## Adapters

Common RL environment libraries such as `dm_env.Environment` and `gymnasium.Env`
can be exposed as `gdm_robotics.interfaces.Environment`s by using the provided
environment wrappers in the `adapter` sub-package, e.g. to wrap a
`dm_env.Environment` object:

```py
from gdm_robotics.adapters import dm_env_to_gdmr_env_wrapper

original_env: dm_env.Environment = ...
env = dm_env_to_gdmr_env_wrapper.DmEnvToGdmrEnvWrapper(original_env)
```

## Installation

`gdm_robotics` can be installed from PyPI using `pip`:

```bash
pip install gdm_robotics
```

## Licence and Disclaimer

Copyright 2025 Google LLC

All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you
may not use this file except in compliance with the Apache 2.0 license. You may
obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0

All other materials are licensed under the Creative Commons Attribution 4.0
International License (CC-BY). You may obtain a copy of the CC-BY license at:
https://creativecommons.org/licenses/by/4.0/legalcode

Unless required by applicable law or agreed to in writing, all software and
materials distributed here under the Apache 2.0 or CC-BY licenses are
distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
either express or implied. See the licenses for the specific language governing 
permissions and limitations under those licenses.

This is not an official Google product.

