This tutorial explains how you can let an agent learn in a certain environment using the command line interface. It assumes that you already installed the MMLF successfully.
Lets assume you just want to test the TD Lambda agent in the Mountain Car environment. Starting this is essentially a one-liner at the command line:
run_mmlf --config mountain_car/world_td_lambda_exploration.yaml
This will start the MMLF and execute the world defined in the file world_td_lambda_exploration.yaml
Note
If you installed the MMLF locally you might have to write ”./run_mmlf –config mountain_car/world_td_lambda_exploration.yaml” instead under unix-based OS.
Note
If this is your very first run of the MMLF, the MMLF will create the so-called “rw-directory” for your user. This rw-directory is essentially the place where the MMLF stores the configuration of all worlds and the log files etc. Per default this directory is the .mmlf directory in your home directory (Under MS Windows $HOME equals $USERPROFILE\.). If you want to change the configuration of a world, this is the place to do it (and not /etc/mmlf). If you want to use a different directory as rw-directory, you can specify this with the option “–rwpath”, for instance run_mmlf –config mountain_car/world_td_lambda_exploration.yaml –rwpath /home/someuser/Temp/mmlfrw” would use the directory “/home/someuser/Temp/mmlfrw” as rw-directory. This directory must be specified in each invocation of the MMLf again, the MMLF does not remember this path!
Once the MMLF rw-directory is created, the world will be started. Some information are printed out, for instance that the agent receives information about state and action space from the environment. Then, the agent starts to perform in the environment and the environments prints out some informations about how the agent performs:
'2011-02-15 11:12:39,700 FrameworkLog INFO Using MMLF RW area /home/jmetzen/.mmlf
'2011-02-15 11:12:39,700 FrameworkLog INFO Loading world from config file mountain_car/world_td_lambda_exploration.yaml.
'2011-02-15 11:12:40,642 AgentLog INFO TDLambdaAgent got new state-space:
StateSpace:
position : ContinuousDimension(LimitType: soft, Limits: (-1.200,0.600))
velocity : ContinuousDimension(LimitType: soft, Limits: (-0.070,0.070))
'2011-02-15 11:12:40,646 AgentLog INFO TDLambdaAgent got new action-space:
ActionSpace:
thrust : DiscreteDimension(Values: ['left', 'right', 'none'])
'2011-02-15 11:12:59,130 EnvironmentLog INFO No goal reached but 500 steps expired!
'2011-02-15 11:13:13,181 EnvironmentLog INFO No goal reached but 500 steps expired!
'2011-02-15 11:13:22,174 EnvironmentLog INFO No goal reached but 500 steps expired!
'2011-02-15 11:13:28,678 EnvironmentLog INFO Goal reached after 202 steps!
'2011-02-15 11:13:28,797 EnvironmentLog INFO Goal reached after 6 steps!
....
This shows that the agent wasn’t able to reach the goal during the first episodes but over time it finds its way to the goal more frequently and faster. You can observe the performance of the agent for some time and see if its performance improves. You can stop the world by CTRl^C.
Once you have stopped the learning, you can take a look in the MMLF RW area (the one created during your first run of the MMLF). There are now two subdirectories: config and logs. The logs directory contains information about the run you just conducted. Among other things, the length of the episodes is stored in a separated log file that can be used for later analysis of the agents performance. To learn more about this, you can take a look at the Logging page. For now, we focus on the config directory.
The config directory contains configuration files for all worlds contained in the MMLF. Lets start with the world configuration file we just used to start our first MMLF run. This one is located in config/mountain_car. If we open world_td_lambda_exploration.yaml, we can see the following:
worldPackage : mountain_car
environment:
moduleName : "mcar_env"
configDict:
maxStepsPerEpisode : 500
accelerationFactor : 0.001
maxGoalVelocity : 0.07
positionNoise : 0.0
velocityNoise : 0.0
agent:
moduleName : "td_lambda_agent"
configDict:
gamma : 1.0
epsilon : 0.01
lambda : 0.95
minTraceValue : 0.5
defaultStateDimDiscretizations : 9
defaultActionDimDiscretizations : 7
function_approximator :
name : 'CMAC'
number_of_tilings : 10
learning_rate : 0.5
update_rule : 'exaggerator'
default : 0.0
monitor:
policyLogFrequency : 25
stateActionValuesLogging:
logFrequency : 25
stateDims: None
actions : None
rasterPoints : 50
functionOverStateSpaceLogging:
logFrequency : 25
stateDims : None
rasterPoints : 50
Basically, this file specifies where the python-modules for the agent and the environment are located and how the parameters of agent and environment are chosen. Furthermore, it specifies which information a module called “Monitor” will store periodically in the log directory (see Monitor for more details on that). The config directory contains a magnitude of world specification files, for instance world_dps.yaml in the directory mountain_car:
worldPackage : mountain_car
environment:
moduleName : "mcar_env"
configDict:
maxStepsPerEpisode : 500
accelerationFactor : 0.001
maxGoalVelocity : 0.07
positionNoise : 0.0
velocityNoise : 0.0
agent:
moduleName : "dps_agent"
configDict:
policy_search :
method: 'fixed_parametrization'
policy:
type: 'linear'
numOfDuplications: 1
bias: True
optimizer:
name: 'evolution_strategy'
sigma: 1.0
populationSize : 5
evalsPerIndividual: 10
numChildren: 10
monitor:
policyLogFrequency : 1
functionOverStateSpaceLogging:
logFrequency : 25
stateDims : None
rasterPoints : 50
As you can see, the environments in the two configuration files are identical but the agents are different. This world can be started with a similar command as the first one, namely using
run_mmlf --config mountain_car/world_dps.yaml
This will start a different learning agent (one using a Direct Policy Search algorithm for learning) and let it learn the mountain car task.
If you are more interested in further experiments with the td_lambda_agent, you can stick with the old world_td_lambda_exploration.yaml but edit the agent part of it. For the purposes of this tutorial, the interesting stuff of this configuration file is the configDict of the agent since this one contains the parameter values that are used by the agent. For instance, we see that the agent uses a discount factor gamma of 1.0 and follows an epsilon-greedy policy (epsilon is 0.01). If you don’t know about the concepts of this agent, you might want to take look at the excellent book by Sutton and Barto . You can now modify the parameters and see how the learning performance is influenced. For instance, set epsilon to “0.0” to get an agent that always acts greedily and store the file as “world_td_lambda_no_exploration.yaml”. To start the world, simply run
run_mmlf --config mountain_car/world_td_lambda_exploration.yaml
You can now use the basic features of the MMLF. Starting other worlds is done similarly, for instance
run_mmlf --config single_pole_balancing/world_dps.yaml
will start the single-pole-balancing scenario with the DPS agent enabled.
See also