MDP Toolbox for MATLAB

mdp_bellman_operator

Applies the Bellman operator to a value function Vprev and returns a new value function and a Vprev-improving policy.

Syntax

[V, policy] = mdp_bellman_operator(P, PR, discount, Vprev)

Description

mdp_bellman_operator applies the Bellman operator: PR + discount*P*Vprev to the value function Vprev.
Returns a new value function and a Vprev-improving policy.

Arguments

P can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS). PR can be a 2D array (SxA) possibly sparse. discount is a real number belonging to ]0; 1]. Vprev is a (Sx1) vector.

Evaluations

V is a (Sx1) vector. policy is a (Sx1) vector. Each element is an integer corresponding to an action.

Example

>> P(:,:,1) = [ 0.5 0.5;   0.8 0.2 ];
>> P(:,:,2) = [ 0 1;   0.1 0.9 ];
>> R = [ 5 10;   -1 2 ];

>> [V, policy] = mdp_bellman_operator(P, R, 0.9, [0;0])
V =
   10
   2
policy =
   2
   2

In the above example, P can be a cell array containing sparse matrices:
>> P{1} = sparse([ 0.5 0.5;  0.8 0.2 ]);
>> P{2} = sparse([ 0 1;  0.1 0.9 ]);
The function call is unchanged.


MDP Toolbox for MATLAB



File : MDPtoolbox/documentation/mdp_bellman_operator.html
Page created on July 31, 2001. Last update on August 31, 2009.