In the Sutton & Barto RL book ( link ), the Watkins Q (λ) learning algorithm shown in Figure 7.14:
Line 10 "For all s, a:", "s, a" here for all (s, a), whereas (s, a) on line 8 and line 9 for current (s, a), is that right?
12 13, a!= a *, 13, e (s, a) 0, , , , 0, '!= * . a!!= A * , , , Q , e (s, a) = 0, e (s, a) - 0, .
, ?