How to train a neural network for playing pong?

I am trying to understand this document explaining how to train a neural network to play pong. https://cloud.github.com/downloads/inf0-warri0r/neural_pong/README.pdf

Recently, I began to study the neural network, and I know the concept of Back Propagation. In this article, Back Propagation is used to train a neural network.

There are five input neurons in this neural network.

  • x ball coordinate (bx)
  • y coordinate of the ball (by)
  • ball speed in x direction (bvx)
  • ball speed in y direction (bvy)
  • Oar position (py).

There are ten neurons in the hidden layer and one neuron in the output layer, which will infer the position of the blade (py).

enter image description here

From this point on, I had some doubts to clean up.

Since Back Propagation is a controlled learning method, it must have some desired result, from which we iteratively subtract the current output to find the error in the output and calculate the gradient descent.

  • Now I do not understand that this will be the desired result in this case. Could this be the distance between the position in which the ball hits the wall and the position of the paddle, which we must maintain at zero?

  • I know that the control oar will be hardcoded for synchronous movement with the ball, but how do we arbitrarily move the other oar while we train it? What values ​​should be specified in the input file "py"?

  • At what point in the game should all five inputs bx, by, bvx, bvy and py be provided? Should we give these inputs and perform one era of inertia of the neural network only when the ball hits the wall?

+5
source share
1 answer

First of all, I would like to dissuade you from using this article as an educational tool. The code is poorly documented, and the article itself is not very informative.

  • In the code repository, he appears to use the output as the distance from the paddle to the place where it should be. He then trains the net based on the actual distance from the paddle to the ball whenever the blade does not pass.

  • The original paper moves the opposite paddle by simply coaching the two nets against each other. This has some disadvantages, but in this case should not be a problem. The value for py is the current coordinate of paddle y

  • In the code, it provides the network with the state of the game currents in each frame, and then allows them to choose the target distance to move. He then trains NN when they pass the ball.

This design has some disadvantages. For example, you get only one data point to train it against each ball, and since the ball is always on the edge of the playing field when we collect this data point, we don’t learn much about how to move when the ball actually bounces around the map.

I would recommend keeping track of all the values ​​that are on the network as they play. You can later train the net using the initial conditions of the game and the actual location the ball hit. Thus, the network can be well prepared, even when it successfully blocks the ball, and it receives data from all points in the game.

+5
source

Source: https://habr.com/ru/post/1233080/


All Articles