Metadata-Version: 2.4
Name: storage-node-env
Version: 0.14.0
Summary: Gymnasium environments for simulating energy nodes with battery energy storage systems
Author-email: Leonardo Guiducci <leonardo.guiducci@unisi.it>
License: MIT
Project-URL: Homepage, https://github.com/unisi-lab305/storage-node-environment
Project-URL: Repository, https://github.com/unisi-lab305/storage-node-environment
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: gymnasium>=0.29.0
Requires-Dist: holidays>=0.35
Requires-Dist: numpy>=1.24.0
Requires-Dist: pandas>=2.0.0
Provides-Extra: dev
Requires-Dist: pre-commit>=3.5.0; extra == "dev"
Requires-Dist: pylint>=3.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Provides-Extra: rl
Requires-Dist: sb3-contrib>=2.0.0; extra == "rl"
Requires-Dist: stable-baselines3>=2.0.0; extra == "rl"
Requires-Dist: tensorboard>=2.14.0; extra == "rl"
Provides-Extra: visualization
Requires-Dist: lttb>=0.3.0; extra == "visualization"
Requires-Dist: matplotlib>=3.5.0; extra == "visualization"
Requires-Dist: rich>=14.0.0; extra == "visualization"
Requires-Dist: seaborn>=0.12.0; extra == "visualization"
Requires-Dist: tqdm>=4.66.0; extra == "visualization"
Provides-Extra: all
Requires-Dist: pre-commit>=3.5.0; extra == "all"
Requires-Dist: pylint>=3.0.0; extra == "all"
Requires-Dist: pytest-cov>=4.0.0; extra == "all"
Requires-Dist: pytest>=7.4.0; extra == "all"
Requires-Dist: ruff>=0.1.0; extra == "all"
Requires-Dist: sb3-contrib>=2.0.0; extra == "all"
Requires-Dist: stable-baselines3>=2.0.0; extra == "all"
Requires-Dist: tensorboard>=2.14.0; extra == "all"
Requires-Dist: lttb>=0.3.0; extra == "all"
Requires-Dist: matplotlib>=3.5.0; extra == "all"
Requires-Dist: rich>=14.0.0; extra == "all"
Requires-Dist: seaborn>=0.12.0; extra == "all"
Requires-Dist: tqdm>=4.66.0; extra == "all"
Dynamic: license-file

# StorageNode Environment

Gymnasium environment for simulating an energy node with battery energy storage system (BESS). Physics-based battery modeling using commercial datasheet parameters for reinforcement learning applications.

[![Python 3.12](https://img.shields.io/badge/python-3.12-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
![CI Tests](https://github.com/unisi-lab305/storage-node-environment/workflows/CI%20Tests/badge.svg)

## Features

- **Gymnasium-compatible environment** registered as `storage_node_env/EnergyStorage-v0`
- **Physics-based battery modeling** with commercial datasheet parameters
- **Two energy node types**: Producer (production only) and Prosumer (production + consumption)
- **Modular reward system** for different optimization objectives (`self_consumption`, `economic`, `self_consumption_delta`)
- **Rule-based controllers** for baseline comparison
- **Flexible observation space** with optional preprocessing and cyclical encoding

## Installation

### From Source (Development Mode)

```bash
git clone https://github.com/unisi-lab305/storage-node-environment.git
cd storage-node-environment
pip install -e .
```

### From PyPI (When Published)

```bash
pip install storage-node-env
```

The environment is automatically registered with Gymnasium on import and can be instantiated using `gym.make()`.

## Quick Start

### Method 1: Using gym.make() (Recommended)

```python
import gymnasium as gym
import storage_node_env  # Trigger environment registration

# Battery configuration
battery_config = {
    'capacity': 5.12,
    'dod_max': 90,
    'power_charge_max': 2.5,
    'power_discharge_max': 2.5,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

# Create environment
env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0
)

# Run simulation
obs, info = env.reset(seed=42)
for step in range(100):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)

    if terminated or truncated:
        break

env.close()
```

### Method 2: Direct Import (Backward Compatible)

```python
from storage_node_env.gym import EnergyStorageEnv

battery_config = {
    'capacity': 5.12,
    'dod_max': 90,
    'power_charge_max': 2.5,
    'power_discharge_max': 2.5,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

env = EnergyStorageEnv(
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0
)

obs, info = env.reset()
# ... same usage as above
```

**Note:** The `gym.make()` approach is recommended as it follows standard Gymnasium conventions and ensures compatibility with Gymnasium ecosystem tools.

## Environment Parameters

| Parameter | Type | Default | Required | Description |
|-----------|------|---------|----------|-------------|
| `node_type` | `str` | - | **Yes** | Type of energy node: `'producer'` or `'prosumer'` |
| `csv_path` | `str` | - | **Yes** | Path to CSV file with historical data |
| `battery_config` | `dict[str, float]` | - | **Yes** | Dictionary with battery parameters (see below) |
| `delta_t` | `float` | - | **Yes** | Timestep duration in hours (e.g., 1.0, 0.25) |
| `lookback_n` | `int` | `2` | No | Number of historical timesteps in observation buffer |
| `num_actions` | `int` | `21` | No | Number of discrete actions (must be odd) |
| `use_preprocessing` | `bool` | `False` | No | Enable observation preprocessing (cyclical encoding, normalization) |
| `add_holiday` | `bool` | `True` | No | Add Italian holiday feature (requires `use_preprocessing=True`) |
| `reward_settings` | `dict \| None` | `None` | No | Reward configuration (see Reward System section) |
| `use_action_masking` | `bool` | `True` | No | Enable physics-based action masking via `action_masks()`. Use with `MaskablePPO` from sb3-contrib. Set to `False` for standard `gym.Env` compatibility |

### CSV Data Requirements

The CSV file must contain a `datetime` column and node-specific columns:

**For Producer Nodes:**

- `datetime`: Timestamp (e.g., `'2024-01-15 00:00:00'`)
- `production`: Power produced in kW
- `buy_price`: Grid purchase price
- `sell_price`: Grid selling price in €/kWh

**For Prosumer Nodes:**

- `datetime`: Timestamp
- `production`: Power produced in kW
- `consumption`: Power consumed by loads in kW
- `buy_price`: Grid purchase price
- `sell_price`: Grid selling price in €/kWh

**Important:** The `delta_t` parameter must match the frequency of your CSV data (e.g., `delta_t=1.0` for hourly data, `delta_t=0.25` for 15-minute data).

## Battery Configuration

The `battery_config` dictionary contains physical parameters for battery simulation based on commercial datasheets.

### Parameters

| Parameter | Type | Required | Valid Range | Units | Description |
|-----------|------|----------|-------------|-------|-------------|
| `capacity` | `float` | **Yes** | > 0 | kWh | Nominal capacity (C_nom) |
| `dod_max` | `float` | **Yes** | 0 < x ≤ 100 | % | Maximum depth of discharge |
| `power_charge_max` | `float` | **Yes** | > 0 | kW | Maximum charging power |
| `power_discharge_max` | `float` | **Yes** | > 0 | kW | Maximum discharging power |
| `efficiency_charge` | `float` | **Yes** | 0 < x ≤ 1 | - | Charging efficiency (e.g., 0.95 for 95%) |
| `efficiency_discharge` | `float` | **Yes** | 0 < x ≤ 1 | - | Discharging efficiency (e.g., 0.95 for 95%) |
| `alpha` | `float` | No | 0 ≤ x < 1 | - | Parasitic loss coefficient (default: 0.0) |
| `soc_initial` | `float \| None` | No | C_min ≤ x ≤ C_max | kWh | Initial state of charge (default: 50% capacity) |
| `allow_arbitrage` | `bool` | No | `True` / `False` | - | If `False`, charging is capped at current PV production each timestep — battery cannot charge from the grid. Compatible with all reward types and controllers. (default: `True`) |

### Physical Meaning

- **Capacity**: Total energy storage when fully charged
- **DoD (Depth of Discharge)**: Usable capacity percentage (e.g., 90% DoD means 90% of nominal capacity is usable)
- **Power limits**: C-rate constraints from battery datasheet (separate for charge/discharge)
- **Efficiency**: Round-trip energy losses during charge/discharge operations (separate for each direction)
- **Alpha**: Standby consumption per timestep (e.g., 0.001 = 0.1% loss per timestep)
- **SoC initial**: Starting energy level in kWh (if `None`, starts at 50% of nominal capacity)

### Power Convention

- **Positive power** = charging (battery absorbs energy from the grid)
- **Negative power** = discharging (battery releases energy to the grid)

### Example Configuration

**Typical values based on ZCS AZZURRO HV ZBT 5K battery:**

```python
battery_config = {
    'capacity': 5.12,                    # 5.12 kWh nominal capacity
    'dod_max': 90,                       # 90% depth of discharge
    'power_charge_max': 2.5,             # 2.5 kW maximum charging power
    'power_discharge_max': 2.5,          # 2.5 kW maximum discharging power
    'efficiency_charge': 0.95,           # 95% charging efficiency
    'efficiency_discharge': 0.95,        # 95% discharging efficiency
    'alpha': 0.0,                        # No parasitic losses (optional)
    'soc_initial': 2.56                  # Start at 50% SoC (optional)
}
```

**Derived parameters (computed automatically):**

- `C_min = (1 - dod_max/100) × capacity` → Minimum usable SoC (kWh)
- `C_max = capacity` → Maximum usable SoC (kWh)

## Reward System

The environment provides a **modular reward system** supporting different optimization objectives through configurable reward calculators.

### Available Reward Types

| Reward Type | Description | Best For | Suitable Node Types |
| --- | --- | --- | --- |
| `'self_consumption'` | Maximize local energy consumption, minimize grid dependency | Prosumer nodes optimizing grid independence | `['prosumer']` |
| `'economic'` | Maximize profit / minimize cost based on net economic outcome | Economic optimization, price-responsive agents | `['producer', 'prosumer']` |
| `'self_consumption_delta'` | Dense, zero-centred signal: improvement over no-battery baseline. Positive when battery helps, negative when it hurts | Prosumers where natural self-consumption is already high (sparse gradient with `'self_consumption'`) | `['prosumer']` |

### Configuration Structure

```python
reward_settings = {
    'type': str,                 # Required: 'self_consumption', 'economic', or 'self_consumption_delta'
    'weights': dict[str, float], # Optional: weight coefficients
    'normalize': bool            # Optional: normalize rewards (default: False)
}
```

### Weight Parameters

| Weight Key | Default | Description |
|------------|---------|-------------|
| `'main'` | `1.0` | Weight for main reward component |
| `'violation_penalty'` | `0.5` | Weight for power constraint violation penalty (normalised to ≈ [0, 1]) |
| `'storage_usage_penalty'` | `0.0` | Weight for battery usage/wear penalty (disabled by default) |

### Reward Composition

The total reward is a weighted linear combination:

```text
total_reward = (weights['main'] × R_main)
             - (weights['violation_penalty'] × P_violation)
             - (weights['storage_usage_penalty'] × P_usage)
```

Where:

- `R_main`: Main reward component (implementation-specific, typically ∈ [0, 1] or ≈ [-1, 1])
- `P_violation`: Power constraint violation normalised by battery max charge power (`|kW| / P_cha_max`) → ≈ [0, 1]
- `P_usage`: Battery usage penalty (absolute SoC change in percentage points)

### Configuration Examples

#### 1. Default (Automatic Selection)

If `reward_settings=None`, the environment automatically selects:

- **Prosumer nodes** → `'self_consumption'`
- **Producer nodes** → `'economic'`

```python
env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0
    # No reward_settings → uses 'self_consumption' by default
)
```

#### 2. Minimal Configuration

Specify only the reward type, use default weights:

```python
reward_settings = {
    'type': 'economic'
    # 'weights' will use defaults from registry
    # 'normalize' will default to False
}
```

#### 3. Balanced Strategy

Moderate optimization with constraint awareness:

```python
reward_settings = {
    'type': 'self_consumption',
    'weights': {
        'main': 1.0,
        'violation_penalty': 0.5,
        'storage_usage_penalty': 0.1
    },
    'normalize': False
}
```

#### 4. Aggressive Optimization

High main weight, low penalties (may violate constraints):

```python
reward_settings = {
    'type': 'economic',
    'weights': {
        'main': 10.0,              # Strong economic signal
        'violation_penalty': 0.1,   # Allow some violations
        'storage_usage_penalty': 0.01  # Minimal wear penalty
    }
}
```

#### 5. Conservative Strategy

High penalties for strict constraint adherence:

```python
reward_settings = {
    'type': 'self_consumption',
    'weights': {
        'main': 1.0,
        'violation_penalty': 5.0,    # Strict constraint adherence
        'storage_usage_penalty': 1.0  # Discourage battery cycling
    }
}
```

#### 6. Dense Delta Reward

Zero-centred per-step signal based on improvement over the idle-battery baseline:

```python
reward_settings = {
    'type': 'self_consumption_delta',
    'weights': {
        'main': 1.0,
        'violation_penalty': 0.5,
        'storage_usage_penalty': 0.0
    }
}

env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0,
    reward_settings=reward_settings
)
```

### Choosing Reward Type

| Node Type | Primary Goal | Recommended Reward |
| --- | --- | --- |
| Prosumer | Minimize grid dependency | `'self_consumption'` |
| Prosumer | Minimize costs | `'economic'` |
| Prosumer | Dense training signal (high baseline sc) | `'self_consumption_delta'` |
| Producer | Maximize profit | `'economic'` |

### Reward Normalization

By default, rewards are **raw (unnormalized)** for interpretability and Stable-Baselines3 compatibility.

#### Option 1: SB3 VecNormalize (Recommended)

```python
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize

env = gym.make('storage_node_env/EnergyStorage-v0', ...)
env = DummyVecEnv([lambda: env])
env = VecNormalize(
    env,
    norm_obs=False,      # Disable observation normalization
    norm_reward=True,    # Enable reward normalization
    clip_reward=10.0,
    gamma=0.99
)
```

#### Option 2: Built-in Normalization

```python
reward_settings = {
    'type': 'self_consumption',
    'normalize': True  # Enable built-in normalization
}
```

## Action Masking

Action masking prevents the RL agent from sampling physically infeasible actions (e.g., charging at full power when the battery is already at maximum SoC). Instead of penalising violations after the fact, infeasible actions are excluded from the sampling space at each step, giving the agent a cleaner learning signal.

Action masking is **enabled by default** (`use_action_masking=True`). To disable it, pass `use_action_masking=False` to the environment constructor.

Requires [`sb3-contrib`](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib), included in the `[rl]` optional extras:

```bash
pip install storage-node-env[rl]
```

### Usage with MaskablePPO

```python
from typing import cast
import gymnasium as gym
import storage_node_env
from sb3_contrib import MaskablePPO
from sb3_contrib.common.wrappers import ActionMasker
from storage_node_env.gym import EnergyStorageEnv

battery_config = {
    'capacity': 5.12,
    'dod_max': 90,
    'power_charge_max': 2.5,
    'power_discharge_max': 2.5,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

# Create environment (use_action_masking=True is the default)
env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0,
    use_preprocessing=True
)

# Wrap with ActionMasker — tells MaskablePPO how to retrieve the mask
def mask_fn(env):
    return cast(EnergyStorageEnv, env.unwrapped).action_masks()

env = ActionMasker(env, mask_fn)

# Train with MaskablePPO
model = MaskablePPO('MlpPolicy', env, verbose=1)
model.learn(total_timesteps=200_000)
model.save('maskable_ppo_prosumer')
env.close()
```

### Disabling action masking

```python
env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0,
    use_action_masking=False   # action_masks() returns all-True
)
```

### How the mask is computed

At each timestep the mask queries the battery's dynamic bounds for the current SoC:

- Actions requesting power above `P_upper` (battery full or C-rate limit) are masked **False**.
- Actions requesting power below `P_lower` (battery empty or C-rate limit) are masked **False**.
- When `allow_arbitrage=False`, `P_upper` is additionally capped at current PV production.
- The neutral action (zero power) is always **True**.

## Rule-Based Controllers

The environment includes **rule-based controllers** that serve as baselines for comparing reinforcement learning agents. These controllers implement fixed decision rules.

**Two usage patterns:**

1. **Direct node evaluation** (recommended for standalone RBC testing): Use controllers with energy node classes (Battery + Producer/Prosumer)
2. **Gymnasium environment evaluation** (v0.4.0+, for RBC vs RL comparison): Use `get_controller_observation()` method to evaluate controllers on Gymnasium environments

### Available Controllers

| Controller | Policy | Use Case | Parameters |
|------------|--------|----------|------------|
| `NaiveController` | Always neutral action (no battery control) | Baseline to measure value of any control strategy | `num_actions` |
| `PriceBasedController` | Energy arbitrage based on electricity prices (charge at low prices, discharge at high prices) | Producer nodes or prosumers with time-of-use tariffs | `num_actions`, `window_size`, `charge_action_pct`, `discharge_action_pct` |
| `SelfConsumptionController` | Maximize local self-consumption (charge during excess production, discharge during deficit) | Prosumer nodes optimizing for grid independence | `num_actions`, `balance_threshold` |

### Usage Example

Controllers are used with Node classes (Producer/Prosumer), not with the Gymnasium environment:

```python
from storage_node_env.core import Prosumer, Battery
from storage_node_env.gym.controllers import SelfConsumptionController

# Create battery and node
battery = Battery(
    capacity=30.0,
    dod_max=90,
    power_charge_max=10.0,
    power_discharge_max=10.0,
    efficiency_charge=0.95,
    efficiency_discharge=0.95
)

node = Prosumer(
    csv_path='dataset/1h/prosumer_test_data.csv',
    delta_t=1.0,
    num_actions=21
)
node.set_storage(battery)
node.reset()

# Create controller
controller = SelfConsumptionController(num_actions=21, balance_threshold=0.5)

# Evaluation loop
total_cost = 0.0
for t in range(len(node.data) - 2):
    # Get current data
    current_row = node.data.iloc[node.time_step]

    # Build observation dictionary for controller
    observation = {
        'production': current_row['production'],
        'consumption': current_row['consumption'],
        'buy_price': current_row['buy_price'],
        'sell_price': current_row['sell_price'],
        'energy_balance': current_row['production'] - current_row['consumption'],
        'final_soc': battery.soc_percent,
        'upper_bound': battery.get_bounds_percent(node.delta_t)[0],
        'lower_bound': battery.get_bounds_percent(node.delta_t)[1]
    }

    # Get action from controller
    action = controller.choose_action(observation, {})

    # Step node
    node_results = node.step(action)
    total_cost += node_results['net_cost']

    # Advance time
    node.advance_time()

print(f'Total cost: {total_cost:.4f} €')
```

### Evaluating Controllers on Gymnasium Environment (v0.4.0+)

**NEW**: For comparing rule-based controllers against RL agents on the same environment:

```python
from typing import cast
import gymnasium as gym
from storage_node_env.gym import EnergyStorageEnv
from storage_node_env.gym.controllers import SelfConsumptionController

# Create environment
env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0
)

# Access unwrapped environment for custom methods
gym_env = cast(EnergyStorageEnv, env.unwrapped)

# Create controller
controller = SelfConsumptionController(num_actions=21)

# Evaluation loop
obs, info = env.reset(seed=42)
total_cost = 0.0

while True:
    # Get controller observation from unwrapped environment
    controller_obs = gym_env.get_controller_observation()
    action = controller.choose_action(controller_obs, {})

    obs, reward, terminated, truncated, info = env.step(action)
    total_cost += info['net_cost']

    if terminated or truncated:
        break

print(f'Total cost: {total_cost:.4f} €')
env.close()
```

**Benefits:**

- ✅ RBC and RL agents see identical data
- ✅ Works with Gym wrappers (VecEnv, Monitor)
- ✅ Type-safe API with `get_controller_observation()`

### Instantiation Examples

```python
from storage_node_env.gym.controllers import (
    NaiveController,
    PriceBasedController,
    SelfConsumptionController
)

# 1. Naive controller (baseline)
naive = NaiveController(num_actions=21)

# 2. Price-based controller (energy arbitrage)
price_based = PriceBasedController(
    num_actions=21,
    window_size=168,           # 1 week rolling window
    charge_action_pct=75.0,    # 50% charge power
    discharge_action_pct=25.0  # 50% discharge power
)
price_based.reset()  # Reset before each episode

# 3. Self-consumption controller
self_consumption = SelfConsumptionController(
    num_actions=21,
    balance_threshold=0.5  # Minimum 0.5 kW imbalance to act
)
```

### Utility Functions

```python
from storage_node_env.gym.controllers import list_controllers, print_controllers

# List available controllers
controllers_info = list_controllers()
# Returns: {'NaiveController': 'description...', 'PriceBasedController': ...}

# Print formatted information
print_controllers()
```

## Complete Examples

### Example 1: Prosumer with Preprocessing

```python
import gymnasium as gym
import storage_node_env

battery_config = {
    'capacity': 5.12,
    'dod_max': 90,
    'power_charge_max': 2.5,
    'power_discharge_max': 2.5,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

reward_settings = {
    'type': 'self_consumption',
    'weights': {
        'main': 1.0,
        'violation_penalty': 0.5,
        'storage_usage_penalty': 0.0
    }
}

env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0,
    lookback_n=2,
    use_preprocessing=True,    # Enable cyclical encoding
    add_holiday=True,          # Add holiday feature
    reward_settings=reward_settings
)

obs, info = env.reset(seed=42)
for step in range(100):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    print(f'Step {step+1}: reward={reward:.4f}, net_cost={info["net_cost"]:.4f} €')

    if terminated or truncated:
        break

env.close()
```

### Example 2: Producer with Energy Arbitrage

```python
import gymnasium as gym
import storage_node_env

battery_config = {
    'capacity': 30.0,
    'dod_max': 90,
    'power_charge_max': 10.0,
    'power_discharge_max': 10.0,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

reward_settings = {
    'type': 'economic',
    'weights': {
        'main': 100.0,             # Amplify economic signal
        'violation_penalty': 10.0,
        'storage_usage_penalty': 1.0
    }
}

env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='producer',
    csv_path='dataset/1h/producer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0,
    reward_settings=reward_settings
)

obs, info = env.reset()
for step in range(100):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    print(f'Step {step+1}: reward={reward:.4f}, net_profit={info["net_profit"]:.4f} €')

    if terminated or truncated:
        break

env.close()
```

### Example 3: Training with Stable-Baselines3

```python
import gymnasium as gym
import storage_node_env
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize

battery_config = {
    'capacity': 5.12,
    'dod_max': 90,
    'power_charge_max': 2.5,
    'power_discharge_max': 2.5,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

# Create environment
env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0,
    use_preprocessing=True
)

# Wrap in vectorized environment and normalize rewards
env = DummyVecEnv([lambda: env])
env = VecNormalize(env, norm_obs=False, norm_reward=True, clip_reward=10.0)

# Train PPO agent
model = PPO('MlpPolicy', env, verbose=1)
model.learn(total_timesteps=100000)

# Save model
model.save('ppo_prosumer')
```

### Example 4: Parallel Training with SubprocVecEnv

`SubprocVecEnv` spawns each environment in a separate subprocess, enabling true CPU parallelism. Each worker runs an independent episode — all reading the same CSV but with different random seeds — so data collection scales with the number of available cores.

```python
import gymnasium as gym
import storage_node_env
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import SubprocVecEnv, VecNormalize
from stable_baselines3.common.utils import set_random_seed

def make_env(csv_path: str, battery_config: dict, rank: int, seed: int = 0):
    """Factory that creates one environment instance for a subprocess worker."""
    def _init():
        env = gym.make(
            'storage_node_env/EnergyStorage-v0',
            node_type='prosumer',
            csv_path=csv_path,
            battery_config=battery_config,
            delta_t=1.0,
            use_preprocessing=True,
            reward_settings={'type': 'self_consumption_delta'}
        )
        env.reset(seed=seed + rank)
        return env
    set_random_seed(seed + rank)
    return _init

battery_config = {
    'capacity': 5.12,
    'dod_max': 90,
    'power_charge_max': 2.5,
    'power_discharge_max': 2.5,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

N_ENVS = 4  # number of parallel workers (tune to CPU core count)
CSV_PATH = 'dataset/1h/prosumer_test_data.csv'

# Create vectorised environment with one subprocess per worker
vec_env = SubprocVecEnv(
    [make_env(CSV_PATH, battery_config, rank=i) for i in range(N_ENVS)]
)

# Normalise rewards online across all workers
vec_env = VecNormalize(vec_env, norm_obs=False, norm_reward=True, clip_reward=10.0)

# Train — SB3 collects N_ENVS steps per rollout step automatically
model = PPO('MlpPolicy', vec_env, verbose=1, n_steps=512, batch_size=128)
model.learn(total_timesteps=500_000)

model.save('ppo_prosumer_parallel')
vec_env.save('vec_normalize.pkl')
vec_env.close()
```

**When to use `SubprocVecEnv` vs `DummyVecEnv`:**

| | `DummyVecEnv` | `SubprocVecEnv` |
| --- | --- | --- |
| Parallelism | Sequential (single process) | True multiprocess (one core per env) |
| Overhead | Minimal | IPC serialisation per step |
| Best for | Debugging, fast envs, < 4 cores | Long rollouts, many cores, slow envs |
| Usage | Drop-in replacement | Replace `DummyVecEnv` with `SubprocVecEnv` |

### Example 5: Multiple Independent Environments for Evaluation

Run several independent evaluation episodes in parallel and aggregate metrics:

```python
import numpy as np
import gymnasium as gym
import storage_node_env
from stable_baselines3.common.vec_env import SubprocVecEnv

def make_eval_env(csv_path: str, battery_config: dict, rank: int):
    def _init():
        return gym.make(
            'storage_node_env/EnergyStorage-v0',
            node_type='prosumer',
            csv_path=csv_path,
            battery_config=battery_config,
            delta_t=1.0,
            use_preprocessing=True,
        )
    return _init

N_EVAL = 4
vec_env = SubprocVecEnv(
    [make_eval_env('dataset/1h/prosumer_test_data.csv', battery_config, i)
     for i in range(N_EVAL)]
)

obs = vec_env.reset()
episode_costs = np.zeros(N_EVAL)
done_flags = np.zeros(N_EVAL, dtype=bool)

while not done_flags.all():
    actions = vec_env.action_space.sample()   # replace with model.predict(obs)
    obs, rewards, dones, infos = vec_env.step(actions)
    for i, info in enumerate(infos):
        if not done_flags[i]:
            episode_costs[i] += info.get('net_cost', 0.0)
    done_flags |= dones

print(f'Mean episode cost across {N_EVAL} workers: {episode_costs.mean():.4f} €')
vec_env.close()
```

## Project Structure

```txt
storage_node_env/
├── core/                    # Core simulation components
│   ├── base/                # Abstract base classes
│   ├── storage/             # Battery implementation
│   └── nodes/               # Energy node implementations (Producer, Prosumer)
├── gym/                       # Gymnasium integration
│   ├── energy_storage_env.py  # Main environment class
│   ├── utils.py               # Observation building utilities
│   ├── preprocessing/         # Feature encoding and preprocessing
│   ├── rewards/               # Modular reward system
│   └── controllers/           # Rule-based baseline controllers
└── __init__.py                # Package initialization and version info
```

## Documentation

- **[REWARD_SYSTEM.md](storage_node_env/gym/rewards/REWARD_SYSTEM.md)**: Detailed reward system documentation
- **[CONTROLLERS.md](storage_node_env/gym/controllers/CONTROLLERS.md)**: Detailed reward system documentation

## Repository

- **GitHub**: [https://github.com/unisi-lab305/storage-node-environment](https://github.com/unisi-lab305/storage-node-environment)
- **License**: MIT

## Citation

If you use this environment in your research, please cite:

```bibtex
@software{storage_node_env,
  title = {Storage Node Environment: Gymnasium Environment for Battery Energy Storage Systems},
  author = {Leonardo Guiducci},
  email = {leonardo.guiducci@unisi.it},
  year = {2025},
  url = {https://github.com/unisi-lab305/storage-node-environment}
}
```

## Contributing

Contributions are welcome! Please see [CLAUDE.md](CLAUDE.md) for development guidelines and coding standards.

## License

This project is licensed under the MIT License - see the LICENSE file for details.
