How to understand the Watkins Q (λ) learning algorithm in Sutton & Barto RL?

In the Sutton & Barto RL book ( link ), the Watkins Q (λ) learning algorithm shown in Figure 7.14: enter image description here Line 10 "For all s, a:", "s, a" here for all (s, a), whereas (s, a) on line 8 and line 9 for current (s, a), is that right?

12 13, a!= a *, 13, e (s, a) 0, , , , 0, '!= * . a!!= A * , , , Q , e (s, a) = 0, e (s, a) - 0, .

, ?

+4
2

, . :

Watkin Q (λ) / , , Q ( ).

, 5:

Choose a' from s' using policy derived from Q (e.g. epsilon-greedy)

a epsilon , ( epsilon), . , / , . -, , , . ...

, .

+5

, . e (s, a) 0 a!= a *, e (s ', a') 1 ( 9 ). .

step by step

0

Source: https://habr.com/ru/post/1662247/


All Articles