Q-learning using neural networks

I am trying to implement the Deep q-learning algorithm for playing pong. I have already implemented Q-learning using the table as a Q-function. It works very well and learns to beat naive AI in 10 minutes. But I can not get it to work using neural networks as an approximator of Q-function.

I want to know if I am on the right track correctly, so here is a summary of what I am doing:

  • I store the current state, action and reward as the current experience in the playback memory.
  • I use a multilayer perceptron as a Q function with 1 hidden layer with 512 hidden units. for input -> hidden layer I use the sigmoid activation function. For hidden → output layer I use the linear activation function
  • The state is represented by the position of both the players and the ball, as well as the speed of the ball. Positions are reassigned to a much smaller state space.
  • I use the epsilon-greedy approach to study the state space, where epsilon gradually drops to 0.
  • During training, a random batch of 32 subsequent events is selected. Then I calculate the target q values ​​for the entire current state and action of Q (s, a).

    forall Experience e in batch if e == endOfEpisode target = e.getReward else target = e.getReward + discountFactor*qMaxPostState end

Now I have a set of 32 target Q values, I train a neural network with these values, using the descent of the batch gradient. I just do 1 learning step. How much should i do?

Java Encog . , , . , - , , . , , , .

+4
1

Q- 1 512 .

. / . ?

, , ?

/ . . ?

, - . ( ). , , RL : 99 , , 1, .

?

, . , , ?

.

0

Source: https://habr.com/ru/post/1655773/


All Articles