MDP Toolbox for MATLAB

mdp_finite_horizon

Solves finite-horizon MDP with backwards induction algorithm.

Syntax

[V, policy, cpu_time] = mdp_finite_horizon (P, R, discount, N)
[V, policy, cpu_time] = mdp_finite_horizon (P, R, discount, N, h)

Description

mdp_finite_horizon applies backwards induction algorithm for finite-horizon MDP. The optimality equations allow to recursively evaluate function values starting from the terminal stage.
This function uses verbose and silent modes. In verbose mode, the function displays the current stage and the corresponding optimal policy.

Arguments

P can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS). R can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS) or a 2D array (SxA) possibly sparse. discount is a real which belongs to ]0; 1]. N is an integer greater than 0. h is a (Sx1) vector.
By default, h = [0; 0; ... 0].

Evaluations

V is a (Sx(N+1)) matrix. Each column n is the optimal value fonction at stage n, with n = 1, ... N.
V(:,N+1) is the terminal reward.
policy is a (SxN) matrix. Each element is an integer corresponding to an action and each column n is the optimal policy at stage n.

Example
In grey, verbose mode display.

>> P(:,:,1) = [ 0.5 0.5;   0.8 0.2 ];
>> P(:,:,2) = [ 0 1;   0.1 0.9 ];
>> R = [ 5 10;   -1 2 ];

>> [V, policy, cpu_time] = mdp_finite_horizon(P, R, 0.9, 3)
stage:3 policy transpose : 2 2
stage:2 policy transpose : 2 1
stage:1 policy transpose : 2 1
V =
   15.9040 11.8000 10.0000 0
     8.6768   6.5600   2.0000 0
policy =
   2 2 2
   1 1 2
cpu_time =
   0.0400

In the above example, P can be a cell array containing sparse matrices:
>> P{1} = sparse([ 0.5 0.5;  0.8 0.2 ]);
>> P{2} = sparse([ 0 1;  0.1 0.9 ]);
The function call is unchanged.


MDP Toolbox for MATLAB


MDPtoolbox/documentation/mdp_finite_horizon.html
Page created on July 31, 2001. Last update on August 31, 2009.