Q-learning using neural networks

Question

Q-learning using neural networks

I am trying to implement the Deep q-learning algorithm for playing pong. I have already implemented Q-learning using the table as a Q-function. It works very well and learns to beat naive AI in 10 minutes. But I can not get it to work using neural networks as an approximator of Q-function.

I want to know if I am on the right track correctly, so here is a summary of what I am doing:

I store the current state, action and reward as the current experience in the playback memory.
I use a multilayer perceptron as a Q function with 1 hidden layer with 512 hidden units. for input -> hidden layer I use the sigmoid activation function. For hidden → output layer I use the linear activation function
The state is represented by the position of both the players and the ball, as well as the speed of the ball. Positions are reassigned to a much smaller state space.
I use the epsilon-greedy approach to study the state space, where epsilon gradually drops to 0.
During training, a random batch of 32 subsequent events is selected. Then I calculate the target q values for the entire current state and action of Q (s, a).
forall Experience e in batch if e == endOfEpisode target = e.getReward else target = e.getReward + discountFactor*qMaxPostState end

Now I have a set of 32 target Q values, I train a neural network with these values, using the descent of the batch gradient. I just do 1 learning step. How much should i do?

Java Encog . , , . , - , , . , , , .

+4

artificial-intelligence deep-learning q-learning neural-network encog

SilverTear 26 . '16 0:44

1

Martin Thoma · Answer 1 · 2018-03-26T09:06:23+0000

Q- 1 512 .

. / . ?

, , ?

/ . . ?

, - . ( ). , , RL : 99 , , 1, .

?

, . , , ?

Q-learning using neural networks

.

More articles: