First of all, I would like to dissuade you from using this article as an educational tool. The code is poorly documented, and the article itself is not very informative.
In the code repository, he appears to use the output as the distance from the paddle to the place where it should be. He then trains the net based on the actual distance from the paddle to the ball whenever the blade does not pass.
The original paper moves the opposite paddle by simply coaching the two nets against each other. This has some disadvantages, but in this case should not be a problem. The value for py is the current coordinate of paddle y
In the code, it provides the network with the state of the game currents in each frame, and then allows them to choose the target distance to move. He then trains NN when they pass the ball.
This design has some disadvantages. For example, you get only one data point to train it against each ball, and since the ball is always on the edge of the playing field when we collect this data point, we donβt learn much about how to move when the ball actually bounces around the map.
I would recommend keeping track of all the values ββthat are on the network as they play. You can later train the net using the initial conditions of the game and the actual location the ball hit. Thus, the network can be well prepared, even when it successfully blocks the ball, and it receives data from all points in the game.
source share