mdp_value_iteration_bound_iter description

MDP Toolbox for MATLAB

mdp_value_iteration_bound_iter

Computes a bound on the number of iterations for the value iteration algorithm.

Syntax

[max_iter, cpu_time] = mdp_value_iteration_bound_iter(P, R, discount)
[max_iter, cpu_time] = mdp_value_iteration_bound_iter(P, R, discount, epsilon)
[max_iter, cpu_time] = mdp_value_iteration_bound_iter(P, R, discount, epsilon, V0)

Description

mdp_value_iteration_bound_iter computes a bound on the number of iterations for the value iteration algorithm to find an epsilon-optimal policy with use of span for the stopping criterion.

Arguments

P : transition probability array.

P can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS).

R : reward array.

R can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS) or a 2D array (SxA) possibly sparse.

discount : discount factor.

discount is a real which belongs to ]0; 1[.

epsilon (optional) : search for an epsilon-optimal policy

epsilon is a real in ]0; 1].
By default, epsilon is set to 0.01.

V0 (optional) : starting value function.

V0 is a (Sx1) vector.
By default, V0 is only composed of 0 elements.

Evaluations

max_iter : maximum number of iterations to be done.

max_iter is an integer greater than 0.

cpu_time : CPU time used to run the program.

Example

>> P(:,:,1) = [ 0.5 0.5; 0.8 0.2 ];
>> P(:,:,2) = [ 0 1; 0.1 0.9 ];
>> R = [ 5 10; -1 2 ];

>> max_iter = mdp_value_iteration_bound_iter(P, R,0.9)
max_iter =
28

In the above example, P can be a cell array containing sparse matrices:
>> P{1} = sparse([ 0.5 0.5; 0.8 0.2 ]);
>> P{2} = sparse([ 0 1; 0.1 0.9 ]);
The function call is unchanged.

MDP Toolbox for MATLAB

MDPtoolbox/documentation/mdp_value_iteration_bound_iter.html
Page created on July 31, 2001. Last update on August 31, 2009.