mdp_computePpolicyPRpolicy description

MDP Toolbox for MATLAB

mdp_computePpolicyPRpolicy

Computes the transition matrix and the reward matrix for a given policy.

Syntax

[Ppolicy, PRpolicy] = mdp_computePpolicyPRpolicy(P, R, policy)

Description

mdp_computePpolicyPRpolicy computes the state transition matrix and the reward matrix of a policy, given a probability matrix P and a reward matrix.

Arguments

P : transition probability array.

P can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS).

R : reward array.

R can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS) or a 2D array (SxA) possibly sparse.

policy : a policy.

policy is a (Sx1) vector of integer representing actions.

Evaluation

Ppolicy : transition probability array of the policy.

Ppolicy is a (SxS) matrix.

PRpolicy : reward matrix of the policy.

PRpolicy is a (Sx1) vector.

Example

>> P(:, :, 1) = [0.6116 0.3884; 0 1.0000];
>> P(:, :, 2) = [0.6674 0.3326; 0 1.0000];
>> R(:, :, 1) = [-0.2433 0.7073; 0 0.1871];
>> R(:, :, 2) = [-0.0069 0.6433; 0 0.2898];
>> policy = [2; 2];
>> [Ppolicy, PRpolicy] = mdp_computePpolicyPRpolicy(P, R, policy)
Ppolicy =
   0.6674    0.3326
            0    1.0000
PRpolicy =
   0.2094
   0.2898

In the above example, P can be a cell array containing sparse matrices:
>> P{1} = sparse([0.6116 0.3884; 0 1.0000]);
>> P{2} = sparse([0.6674 0.3326; 0 1.0000]);
The function call is unchanged.

MDP Toolbox for MATLAB

File : MDPtoolbox/documentation/mdp_computePpolicyPRpolicy.html
Page created on August 31, 2009.