eta_utility.eta_x.envs.base_env_sim module

class eta_utility.eta_x.envs.base_env_sim.BaseEnvSim(env_id: int, config_run: ConfigOptRun, verbose: int = 2, callback: Callable | None = None, *, scenario_time_begin: datetime | str, scenario_time_end: datetime | str, episode_duration: TimeStep | str, sampling_time: TimeStep | str, model_parameters: Mapping[str, Any] | None = None, sim_steps_per_sample: int | str = 1, **kwargs: Any)[source]

Bases: BaseEnv, ABC

Base class for FMU Simulation models environments.

Parameters:
  • env_id – Identification for the environment, useful when creating multiple environments.

  • config_run – Configuration of the optimization run.

  • seed – Random seed to use for generating random numbers in this environment (default: None / create random seed).

  • verbose – Verbosity to use for logging.

  • callback – callback which should be called after each episode.

  • scenario_time_begin – Beginning time of the scenario.

  • scneario_time_end – Ending time of the scenario.

  • episode_duration – Duration of the episode in seconds.

  • sampling_time – Duration of a single time sample / time step in seconds.

  • model_parameters – Parameters for the mathematical model.

  • sim_steps_per_sample – Number of simulation steps to perform during every sample.

  • kwargs – Other keyword arguments (for subclasses).

abstract property fmu_name: str

Name of the FMU file

sim_steps_per_sample: int

Number of simulation steps to be taken for each sample. This must be a divisor of ‘sampling_time’.

path_fmu: pathlib.Path

The FMU is expected to be placed in the same folder as the environment

model_parameters: Mapping[str, int | float] | None

Configuration for the FMU model parameters, that need to be set for initialization of the Model.

simulator: FMUSimulator

Instance of the FMU. This can be used to directly access the eta_utility.FMUSimulator interface.

simulate(state: Mapping[str, float]) tuple[dict[str, float], bool, float][source]

Perform a simulator step and return data as specified by the is_ext_observation parameter of the state_config.

Parameters:

state – State of the environment before the simulation.

Returns:

Output of the simulation, boolean showing whether all simulation steps where successful, time elapsed during simulation.

step(action: np.ndarray) StepResult[source]

Perform one time step and return its results. This is called for every event or for every time step during the simulation/optimization run. It should utilize the actions as supplied by the agent to determine the new state of the environment. The method must return a four-tuple of observations, rewards, dones, info.

This also updates self.state and self.state_log to store current state information.

Note

This function always returns 0 reward. Therefore, it must be extended if it is to be used with reinforcement learning agents. If you need to manipulate actions (discretization, policy shaping, …) do this before calling this function. If you need to manipulate observations and rewards, do this after calling this function.

Parameters:

action – Actions to perform in the environment.

Returns:

The return value represents the state of the environment after the step was performed.

  • observations: A numpy array with new observation values as defined by the observation space. Observations is a np.array() (numpy array) with floating point or integer values.

  • reward: The value of the reward function. This is just one floating point value.

  • terminated: Boolean value specifying whether an episode has been completed. If this is set to true, the reset function will automatically be called by the agent or by eta_i.

  • truncated: Boolean, whether the truncation condition outside the scope is satisfied.

    Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the user needs to call the reset function.

  • info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.

reset(*, seed: int | None = None, options: dict[str, Any] | None = None) tuple[ObservationType, dict[str, Any]][source]

Resets the environment to an initial internal state, returning an initial observation and info.

This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the seed parameter otherwise if the environment already has a random number generator and reset() is called with seed=None, the RNG is not reset. When using the environment in conjunction with stable_baselines3, the vectorized environment will take care of seeding your custom environment automatically.

For Custom environments, the first line of reset() should be super().reset(seed=seed) which implements the seeding correctly.

Note

Don’t forget to store and reset the episode_timer.

Parameters:
  • seed – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and seed=None (the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG and seed=None is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. (default: None)

  • options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)

Returns:

Tuple of observation and info. The observation of the initial state will be an element of observation_space (typically a numpy array) and is analogous to the observation returned by step(). Info is a dictionary containing auxiliary information complementing observation. It should be analogous to the info returned by step().

close() None[source]

Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.

Default behavior for the Simulation environment is to close the FMU object.

action_space: spaces.Space[ActType]
observation_space: spaces.Space[ObsType]
verbose: int

Verbosity level used for logging.

config_run: ConfigOptRun

Information about the optimization run and information about the paths. For example, it defines path_results and path_scenarios.

path_results: pathlib.Path

Path for storing results.

path_scenarios: pathlib.Path | None

Path for the scenario data.

path_env: pathlib.Path

Path of the environment file.

callback: Callable | None

Callback can be used for logging and plotting.

env_id: int

ID of the environment (useful for vectorized environments).

run_name: str

Name of the current optimization run.

n_episodes: int

Number of completed episodes.

n_steps: int

Current step of the model (number of completed steps) in the current episode.

n_steps_longtime: int

Current step of the model (total over all episodes).

episode_duration: float

Duration of one episode in seconds.

sampling_time: float

Sampling time (interval between optimization time steps) in seconds.

n_episode_steps: int

Number of time steps (of width sampling_time) in each episode.

scenario_duration: float

Duration of the scenario for each episode (for total time imported from csv).

scenario_time_begin: datetime

Beginning time of the scenario.

scenario_time_end: datetime

Ending time of the scenario (should be in the format %Y-%m-%d %H:%M).

timeseries: pd.DataFrame

The time series DataFrame contains all time series scenario data. It can be filled by the import_scenario method.

ts_current: pd.DataFrame

Data frame containing the currently valid range of time series data.

state_config: StateConfig | None

Configuration to describe what the environment state looks like.

episode_timer: float

Episode timer (stores the start time of the episode).

state: dict[str, float]

Current state of the environment.

additional_state: dict[str, float] | None

Additional state information to append to the state during stepping and reset

state_log: list[dict[str, float]]

Log of the environment state.

state_log_longtime: list[list[dict[str, float]]]

Log of the environment state over multiple episodes.

data: dict[str, Any]

Some specific current environment settings / other data, apart from state.

data_log: list[dict[str, Any]]

Log of specific environment settings / other data, apart from state for the episode.

data_log_longtime: list[list[dict[str, Any]]]

Log of specific environment settings / other data, apart from state, over multiple episodes.