Understanding Sarsa's linear, gradient descent (based on Sutton & Barto)

I am trying to implement Sarsa linear gradient descent based on the Sutton and Barto Book , see the algorithm in the figure below.

/ p>

However, I am trying to understand something in the algorithm:

  • Is the dimension w and z independent of how many different actions can be taken? It seems that in the book they have a dimension equal to the number of functions that, I would say, are independent of the number of actions.
  • Are there w and az for each action? In addition, I do not see in the book that this should be so.
  • If I am right in the two above brands, then I don’t see how the F_a index will depend on actions, and therefore I don’t see how the q_a action function can affect actions (see lines marked with yellow in the algorithm below). But action-value should depend on action. So I'm not getting something ...

I hope someone can help clarify this for me :)

Sarsa algo

+4
source share
1 answer

w - . , , Q(s,a), , . , , , , , . , ( w). w, , , . , , . Q , , , , . . !

, ( ). . , , , 12 (z - , , w ). , , 10.1.

+4

Source: https://habr.com/ru/post/1661473/


All Articles