Quick start (command line interface)ΒΆ

This tutorial explains how you can let an agent learn in a certain environment using the command line interface. It assumes that you already installed the MMLF successfully.

Lets assume you just want to test the TD Lambda agent in the Mountain Car environment. Starting this is essentially a one-liner at the command line:

run_mmlf --config mountain_car/world_td_lambda_exploration.yaml

This will start the MMLF and execute the world defined in the file world_td_lambda_exploration.yaml

Note

If you installed the MMLF locally you might have to write ”./run_mmlf –config mountain_car/world_td_lambda_exploration.yaml” instead under unix-based OS.

Note

If this is your very first run of the MMLF, the MMLF will create the so-called “rw-directory” for your user. This rw-directory is essentially the place where the MMLF stores the configuration of all worlds and the log files etc. Per default this directory is the .mmlf directory in your home directory (Under MS Windows $HOME equals $USERPROFILE\.). If you want to change the configuration of a world, this is the place to do it (and not /etc/mmlf). If you want to use a different directory as rw-directory, you can specify this with the option “–rwpath”, for instance run_mmlf –config mountain_car/world_td_lambda_exploration.yaml –rwpath /home/someuser/Temp/mmlfrw” would use the directory “/home/someuser/Temp/mmlfrw” as rw-directory. This directory must be specified in each invocation of the MMLf again, the MMLF does not remember this path!

Once the MMLF rw-directory is created, the world will be started. Some information are printed out, for instance that the agent receives information about state and action space from the environment. Then, the agent starts to perform in the environment and the environments prints out some informations about how the agent performs:

'2011-02-15 11:12:39,700 FrameworkLog         INFO     Using MMLF RW area /home/jmetzen/.mmlf
'2011-02-15 11:12:39,700 FrameworkLog         INFO     Loading world from config file mountain_car/world_td_lambda_exploration.yaml.
'2011-02-15 11:12:40,642 AgentLog             INFO     TDLambdaAgent got new state-space:
        StateSpace:
                position        : ContinuousDimension(LimitType: soft, Limits: (-1.200,0.600))
                velocity        : ContinuousDimension(LimitType: soft, Limits: (-0.070,0.070))
'2011-02-15 11:12:40,646 AgentLog             INFO     TDLambdaAgent got new action-space:
        ActionSpace:
                thrust          : DiscreteDimension(Values: ['left', 'right', 'none'])
'2011-02-15 11:12:59,130 EnvironmentLog       INFO     No goal reached but 500 steps expired!
'2011-02-15 11:13:13,181 EnvironmentLog       INFO     No goal reached but 500 steps expired!
'2011-02-15 11:13:22,174 EnvironmentLog       INFO     No goal reached but 500 steps expired!
'2011-02-15 11:13:28,678 EnvironmentLog       INFO     Goal reached after 202 steps!
'2011-02-15 11:13:28,797 EnvironmentLog       INFO     Goal reached after 6 steps!

....

This shows that the agent wasn’t able to reach the goal during the first episodes but over time it finds its way to the goal more frequently and faster. You can observe the performance of the agent for some time and see if its performance improves. You can stop the world by CTRl^C.

Once you have stopped the learning, you can take a look in the MMLF RW area (the one created during your first run of the MMLF). There are now two subdirectories: config and logs. The logs directory contains information about the run you just conducted. Among other things, the length of the episodes is stored in a separated log file that can be used for later analysis of the agents performance. To learn more about this, you can take a look at the Logging page. For now, we focus on the config directory.

The config directory contains configuration files for all worlds contained in the MMLF. Lets start with the world configuration file we just used to start our first MMLF run. This one is located in config/mountain_car. If we open world_td_lambda_exploration.yaml, we can see the following:

worldPackage : mountain_car
environment:
    moduleName : "mcar_env"
    configDict: 
        maxStepsPerEpisode : 500    
        accelerationFactor : 0.001
        maxGoalVelocity : 0.07
        positionNoise : 0.0
        velocityNoise : 0.0
agent:
    moduleName : "td_lambda_agent"
    configDict:
        gamma : 1.0
        epsilon : 0.01
        lambda : 0.95
        minTraceValue : 0.5
        defaultStateDimDiscretizations : 9
        defaultActionDimDiscretizations : 7
        function_approximator : 
            name : 'CMAC'
            number_of_tilings : 10
            learning_rate : 0.5
            update_rule : 'exaggerator'
            default : 0.0
monitor:
    policyLogFrequency : 25
    stateActionValuesLogging:
        logFrequency : 25
        stateDims: None
        actions : None
        rasterPoints : 50
    functionOverStateSpaceLogging:
        logFrequency : 25
        stateDims : None
        rasterPoints : 50

Basically, this file specifies where the python-modules for the agent and the environment are located and how the parameters of agent and environment are chosen. Furthermore, it specifies which information a module called “Monitor” will store periodically in the log directory (see Monitor for more details on that). The config directory contains a magnitude of world specification files, for instance world_dps.yaml in the directory mountain_car:

worldPackage : mountain_car
environment:
    moduleName : "mcar_env"
    configDict: 
        maxStepsPerEpisode : 500    
        accelerationFactor : 0.001
        maxGoalVelocity : 0.07
        positionNoise : 0.0
        velocityNoise : 0.0
agent:
    moduleName : "dps_agent"
    configDict:
        policy_search : 
            method: 'fixed_parametrization'
            policy: 
                type: 'linear'
                numOfDuplications: 1
                bias: True
            optimizer: 
                name: 'evolution_strategy'
                sigma:  1.0
                populationSize : 5
                evalsPerIndividual: 10
                numChildren: 10
monitor:
    policyLogFrequency : 1
    functionOverStateSpaceLogging:
        logFrequency : 25
        stateDims : None
        rasterPoints : 50

As you can see, the environments in the two configuration files are identical but the agents are different. This world can be started with a similar command as the first one, namely using

run_mmlf  --config mountain_car/world_dps.yaml

This will start a different learning agent (one using a Direct Policy Search algorithm for learning) and let it learn the mountain car task.

If you are more interested in further experiments with the td_lambda_agent, you can stick with the old world_td_lambda_exploration.yaml but edit the agent part of it. For the purposes of this tutorial, the interesting stuff of this configuration file is the configDict of the agent since this one contains the parameter values that are used by the agent. For instance, we see that the agent uses a discount factor gamma of 1.0 and follows an epsilon-greedy policy (epsilon is 0.01). If you don’t know about the concepts of this agent, you might want to take look at the excellent book by Sutton and Barto . You can now modify the parameters and see how the learning performance is influenced. For instance, set epsilon to “0.0” to get an agent that always acts greedily and store the file as “world_td_lambda_no_exploration.yaml”. To start the world, simply run

run_mmlf  --config mountain_car/world_td_lambda_exploration.yaml

You can now use the basic features of the MMLF. Starting other worlds is done similarly, for instance

run_mmlf  --config single_pole_balancing/world_dps.yaml

will start the single-pole-balancing scenario with the DPS agent enabled.

See also

Tutorial Quick start (graphical user interface)
Learn how to use the MMLF’s graphical user interface
Tutorial Writing an agent
Learn how to write your own MMLF agent
Tutorial Writing an environment
Learn how to write your own MMLF environment
Learn more about Experiments
Learn how to do a serious benchmarking and statistical comparison of the performance of different agents/environments

Previous topic

Installation

Next topic

Quick start (graphical user interface)

This Page