mdp_LP description

MDP Toolbox for MATLAB

mdp_LP

Solves discounted MDP with linear programming.

Syntax

[V, policy, cpu_time] = mdp_LP(P, R, discount)

Description

mdp_LP applies linear programming to solve discounted MDP.
The algorithm uses the linprog function of the MATLAB Optimization Toolbox.
No additional display in verbose mode.

Arguments

P : transition probability array.

P can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS).

R : reward array.

R can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS) or a 2D array (SxA) possibly sparse.

discount : discount factor.

discount is a real which belongs to ]0; 1[

Evaluations

V : optimal value fonction.

V is a (Sx1) vector.

policy : optimal policy.

policy is a (Sx1) vector. Each element is an integer corresponding to an action which maximizes the value function.

cpu_time : CPU time used to run the program.

Example

>> P(:,:,1) = [ 0.5 0.5; 0.8 0.2 ];
>> P(:,:,2) = [ 0 1; 0.1 0.9 ];
>> R = [ 5 10; -1 2 ];

>> [V, policy,cpu_time] = mdp_LP(P, R, 0.9)
Optimization terminated successfully.
V =
   42.4419
   36.0465
policy =
   2
   1
cpu_time =
   0.3600

In the above example, P can be a cell array containing sparse matrices:
>> P{1} = sparse([ 0.5 0.5; 0.8 0.2 ]);
>> P{2} = sparse([ 0 1; 0.1 0.9 ]);
The function call is unchanged.

MDP Toolbox for MATLAB

File : MDPtoolbox/documentation/mdp_LP.html
Page created on July 31, 2001. Last update on August 31, 2009.