mdp_eval_policy_TD_0 description

MDP Toolbox for MATLAB

mdp_eval_policy_TD_0

Evaluates a policy using the TD(0) algorithm.

Syntax

Vpolicy = mdp_eval_policy_TD_0 (P, R, discount, policy)
Vpolicy = mdp_eval_policy_TD_0 (P, R, discount, policy, N)

Description

mdp_eval_policy_TD_0 evaluates the value fonction associated to a policy using the TD(0) algorithm (Reinforcement Learning).

Arguments

P : transition probability array.

P can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS).

R : reward array.

R can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS) or a 2D array (SxA) possibly sparse.

discount : discount factor.

discount is a real which belongs to [0; 1].

policy : a policy.

policy is a (Sx1) vector. Each element is an integer corresponding to an action.

N (optional) : number of iterations to perform.

N is an integer greater than the default value.
By default, N is set to 10000.

Evaluation

Vpolicy : value fonction.

Vpolicy is a (Sx1) vector.

Example

>> % To be able to reproduce the following example, it is necessary to initialize the pseudorandom number generator
>> rand('seed',0)

>> P(:,:,1) = [ 0.5 0.5; 0.8 0.2 ];
>> P(:,:,2) = [ 0 1; 0.1 0.9 ];
>> R = [ 5 10; -1 2 ];

>> Vpolicy = mdp_eval_policy_TD_0(P, R, 0.9, [1; 2])
Vpolicy =
29.0357
24.2148

In the above example, P can be a cell array containing sparse matrices:
>> P{1} = sparse([ 0.5 0.5; 0.8 0.2 ]);
>> P{2} = sparse([ 0 1; 0.1 0.9 ]);
The function call is unchanged.

MDP Toolbox for MATLAB

MDPtoolbox/documentation/mdp_eval_policy_TD_0.html
Page created on August 31, 2009.