How to choose an action in the TD (0) training

Question

How to choose an action in the TD (0) training

I am currently reading Sutton's book Reinforcement Learning: An introduction. After reading Chapter 6.1, I wanted to implement the TD(0)RL algorithm for this parameter:

To do this, I tried to implement the pseudo code presented here:

Having done this, I wondered how to take this step A <- action given by π for S: can I choose the optimal action Afor my current state S? Since the function of the value V(S)depends only on the state, and not on the action, which I really do not know how to do this.

I found this question (where did I get the images from) that deals with the same exercise - but here the action is simply randomly selected and not selected by the policy action π.

: , action-value function Q(s, a)?

+4

reinforcement-learning temporal-difference

FlashTek 21 . '17 7:23

1

Pablo EM · Accepted Answer · 2017-07-21T07:48:32+0000

, ( π) V(s), , , s.

, , , , , TD (0) . , , . .

, Q(s,a). Q(s,a) , , SARSA Q-learning.

Sutton RL : . , - ( ). 6:

, - . ( ), DP, TD Monte (). .

How to choose an action in the TD (0) training

More articles: