MDP Toolbox for MATLAB

mdp_eval_policy_optimality

Determines sets of 'near optimal' actions for all states.

Syntax

[multiple, optimal_actions] = mdp_eval_policy_optimality(P, R, discount, Vpolicy)

Description

For some states, the evaluation of the value function may give close results for different actions. It is interesting to identify those states for which several actions have a value function very close the optimal one (i.e. less than 0.01 different). We called this the search for near optimal actions in each state.

Arguments

P can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS). R can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS) or a 2D array (SxA) possibly sparse. discount is a real which belongs to ]0; 1[. Vpolicy is a (Sx1) vector.

Evaluation

multiple is egal to true when at least one state has several epsilon-optimal actions, false if not. optimal_actions is a (SxA) boolean matrix whose element optimal_actions(s, a) is true if the action a is 'nearly' optimal being in state s and false if not.

Example

>> P(:,:,1) = [ 0.5 0.5;   0.8 0.2 ];
>> P(:,:,2) = [ 0 1;   0.1 0.9 ];
>> R = [ 5 10;   -1 2 ];
>> Vpolicy = [ 42.4419;   36.0465 ];
>> [multiple, optimal_actions] = mdp_eval_policy_optimality(P, R, 0.9, Vpolicy)
multiple =
   0
optimal_actions =
   0   1
   1   0

In the above example, P can be a cell array containing sparse matrices:
>> P{1} = sparse([ 0.5 0.5;  0.8 0.2 ]);
>> P{2} = sparse([ 0 1;  0.1 0.9 ]);
The function call is unchanged.


MDP Toolbox for MATLAB



MDPtoolbox/documentation/mdp_eval_policy_optimality.html
Page created on August 31, 2009.