MDP Toolbox for MATLAB |
mdp_policy_iteration
Solves discounted MDP with policy iteration algorithm.
Syntax
[V, policy, iter, cpu_time] = mdp_policy_iteration (P, R, discount)
[V, policy, iter, cpu_time] = mdp_policy_iteration (P, R, discount, policy0)
[V, policy, iter, cpu_time] = mdp_policy_iteration (P, R, discount, policy0, max_iter)
[V, policy, iter, cpu_time] = mdp_policy_iteration (P, R, discount, policy0, max_iter, eval_type)
Description
mdp_policy_iteration applies the policy iteration algorithm to
solve discounted MDP. The algorithm consists in
improving the policy iteratively, using the evaluation of the current policy.
Iterating is stopped when two successive policies are identical or when
a specified number (max_iter) of iterations have been performed.
This function uses verbose and silent modes. In verbose mode, the function
displays the number of different actions between the policies n-1 and n
after each iteration.
Arguments
P can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS).
R can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS) or a 2D array (SxA) possibly sparse.
discount is a real which belongs to ]0; 1[.
policy0 is a (Sx1) vector.
By default, policy0 is the policy which maximizes the expected immediate reward.
max_iter is an integer greater than 0.
By default, max_iter is set to 1000.
eval_type is 0 for mdp_eval_policy_matrix use, mdp_eval_policy_iterative is used in all other cases.
By default, eval_type is set to 0.
Evaluations
V is a (Sx1) vector.
policy is a (Sx1) vector. Each element is an integer
corresponding to an action which maximizes the value function.
Example
In grey, verbose mode display.
>> P(:,:,1) = [ 0.5 0.5;   0.8 0.2 ];
>> P(:,:,2) = [ 0 1;   0.1 0.9 ];
>> R = [ 5 10;   -1 2 ];
>> [V, policy, iter, cpu_time] = mdp_policy_iteration(P, R, 0.9)
  Iteration Number_of_different_actions
        1            1
        2            0
V =
   42.4419
   36.0465
policy =
   2
   1
iter =
   2
cpu_time =
   0.0200
In the above example, P can be a cell array containing sparse matrices:
>> P{1} = sparse([ 0.5 0.5;  0.8 0.2 ]);
>> P{2} = sparse([ 0 1;  0.1 0.9 ]);
The function call is unchanged.
MDP Toolbox for MATLAB |