I work in a reinforcement training program and I use this article as a reference . I am using python with keras (theano) to create a neural network and the pseudo code that I use for this program,
Do a feedforward pass for the current state s to get predicted Q-values for all actions. Do a feedforward pass for the next state s' and calculate maximum overall network outputs max a' Q(s', a'). Set Q-value target for action to r + γmax a' Q(s', a') (use the max calculated in step 2). For all other actions, set the Q-value target to the same as originally returned from step 1, making the error 0 for those outputs. Update the weights using backpropagation.
The loss function equation here is

where my reward is +1, maxQ (s ', a') = 0.8375 and Q (s, a) = 0.6892
My L will be 1/2*(1+0.8375-0.6892)^2=0.659296445
Now, how do I update the model’s neural network weights using the above value of the loss function, if my model structure is
model = Sequential() model.add(Dense(150, input_dim=150)) model.add(Dense(10)) model.add(Dense(1,activation='sigmoid')) model.compile(loss='mse', optimizer='adam')
source share