I am trying to implement the Deep q-learning algorithm for playing pong. I have already implemented Q-learning using the table as a Q-function. It works very well and learns to beat naive AI in 10 minutes. But I can not get it to work using neural networks as an approximator of Q-function.
I want to know if I am on the right track correctly, so here is a summary of what I am doing:
- I store the current state, action and reward as the current experience in the playback memory.
- I use a multilayer perceptron as a Q function with 1 hidden layer with 512 hidden units. for input -> hidden layer I use the sigmoid activation function. For hidden → output layer I use the linear activation function
- The state is represented by the position of both the players and the ball, as well as the speed of the ball. Positions are reassigned to a much smaller state space.
- I use the epsilon-greedy approach to study the state space, where epsilon gradually drops to 0.
During training, a random batch of 32 subsequent events is selected. Then I calculate the target q values for the entire current state and action of Q (s, a).
forall Experience e in batch
if e == endOfEpisode
target = e.getReward
else
target = e.getReward + discountFactor*qMaxPostState
end
Now I have a set of 32 target Q values, I train a neural network with these values, using the descent of the batch gradient. I just do 1 learning step. How much should i do?
Java Encog . , , . , - , , . , , , .