MDP Toolbox for MATLAB |
mdp_eval_policy_iterative
Evaluates a policy using iterations of the Bellman operator.
Syntax
Vpolicy = mdp_eval_policy_iterative(P, R, discount, policy)
Vpolicy = mdp_eval_policy_iterative(P, R, discount, policy, V0)
Vpolicy = mdp_eval_policy_iterative(P, R, discount, policy, V0, epsilon)
Vpolicy = mdp_eval_policy_iterative(P, R, discount, policy, V0, epsilon, max_iter)
Description
mdp_eval_policy_iterative evaluates the value fonction associated to a policy applying iteratively the Bellman operator.
Arguments
Evaluation
Example
In grey, verbose mode display.
>> P(:,:,1) = [ 0.5 0.5; 0.8 0.2 ];
>> P(:,:,2) = [ 0 1; 0.1 0.9 ];
>> R = [ 5 10; -1 2 ];
>> policy = [2; 1];
>> Vpolicy = mdp_eval_policy_iterative(P, R, 0.8, policy)
Iteration V_variation
1 10
2 6.24
3 4.992
4 3.2727
5 2.6182
6 1.7993
7 1.4394
8 1.0306
9 0.82446
10 0.61003
11 0.48802
12 0.37013
13 0.2961
14 0.22857
15 0.18286
16 0.14288
17 0.1143
18 0.090049
19 0.072039
20 0.05706
21 0.045648
22 0.036285
23 0.029028
24 0.023126
25 0.018501
26 0.014762
27 0.011809
28 0.0094313
29 0.0075451
30 0.0060295
31 0.0048236
32 0.0038562
33 0.0030849
34 0.0024668
35 0.0019735
36 0.0015783
37 0.0012627
38 0.0010099
39 0.00080795
40 0.00064629
41 0.00051703
42 0.00041359
43 0.00033087
44 0.00026469
45 0.00021175
46 0.00016939
47 0.00013552
48 0.00010841
49 8.6728e-05
MDP Toolbox: iterations stopped, epsilon-optimal value function
Vpolicy =
23.1704
16.4631
In the above example, P can be a cell array containing sparse matrices:
>> P{1} = sparse([ 0.5 0.5; 0.8 0.2 ]);
>> P{2} = sparse([ 0 1; 0.1 0.9 ]);
The function call is unchanged.
MDP Toolbox for MATLAB |