How to understand the Watkins Q (λ) learning algorithm in Sutton & Barto RL?

Question

How to understand the Watkins Q (λ) learning algorithm in Sutton & Barto RL?

In the Sutton & Barto RL book ( link ), the Watkins Q (λ) learning algorithm shown in Figure 7.14: Line 10 "For all s, a:", "s, a" here for all (s, a), whereas (s, a) on line 8 and line 9 for current (s, a), is that right?

12 13, a!= a *, 13, e (s, a) 0, , , , 0, '!= * . a!!= A * , , , Q , e (s, a) = 0, e (s, a) - 0, .

, ?

+4

reinforcement-learning q-learning

user186199 29 . '16 9:47

2

tom1139 · Answer 1 · 2016-11-30T22:53:26+0000

, . :

Watkin Q (λ) / , , Q ( ).

, 5:

Choose a' from s' using policy derived from Q (e.g. epsilon-greedy)

a epsilon , ( epsilon), . , / , . -, , , . ...

, .

user186199 · Answer 2 · 2016-11-30T15:36:16+0000

, . e (s, a) 0 a!= a *, e (s ', a') 1 ( 9 ). .

How to understand the Watkins Q (λ) learning algorithm in Sutton & Barto RL?

More articles: