MDP Toolbox for MATLAB |
mdp_finite_horizon
Solves finite-horizon MDP with backwards induction algorithm.
Syntax
[V, policy, cpu_time] = mdp_finite_horizon (P, R, discount, N)
[V, policy, cpu_time] = mdp_finite_horizon (P, R, discount, N, h)
Description
mdp_finite_horizon applies backwards induction algorithm for
finite-horizon MDP. The optimality equations allow to recursively evaluate function values starting from the terminal stage.
This function uses verbose and silent modes. In verbose mode, the function
displays the current stage and the corresponding optimal policy.
Arguments
P can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS).
R can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS) or a 2D array (SxA) possibly sparse.
discount is a real which belongs to ]0; 1].
N is an integer greater than 0.
h is a (Sx1) vector.
By default, h = [0; 0; ... 0].
Evaluations
V is a (Sx(N+1)) matrix.
Each column n is the optimal value fonction at stage n, with n = 1, ... N.
V(:,N+1) is the terminal reward.
policy is a (SxN) matrix. Each element is an integer corresponding to an
action and each column n is the optimal policy at stage n.
Example
In grey, verbose mode display.
>> P(:,:,1) = [ 0.5 0.5;   0.8 0.2 ];
>> P(:,:,2) = [ 0 1;   0.1 0.9 ];
>> R = [ 5 10;   -1 2 ];
>> [V, policy, cpu_time] = mdp_finite_horizon(P, R, 0.9, 3)
stage:3 policy transpose : 2 2
stage:2 policy transpose : 2 1
stage:1 policy transpose : 2 1
V =
   15.9040 11.8000 10.0000 0
     8.6768   6.5600   2.0000 0
policy =
   2 2 2
   1 1 2
cpu_time =
   0.0400
In the above example, P can be a cell array containing sparse matrices:
>> P{1} = sparse([ 0.5 0.5;  0.8 0.2 ]);
>> P{2} = sparse([ 0 1;  0.1 0.9 ]);
The function call is unchanged.
MDP Toolbox for MATLAB |