I play with machine learning. Especially Q-Learning, where you have status and actions, as well as rewards, depending on how well the network works.
Now, for beginners, I set myself a simple goal: to train the network so that it gives out valid moves for tic-tac-toe (against a random opponent) as actions. My problem is that the network does not learn at all or even worsens over time.
The first thing I did was contact the torch and the deep training module for it: https://github.com/blakeMilner/DeepQLearning .
Then I wrote a simple tic-tac-toe game where a random player competes with a neural network and connected it to the code from this example https://github.com/blakeMilner/DeepQLearning/blob/master/test.lua . The output signal of the network consists of 9 nodes for installing the corresponding cell.
A move is permissible if the network selects an empty cell (it does not have X or O). In accordance with this, I give a positive reward (if the network selects an empty cell) and negative rewards (if the network selects an occupied cell).
The problem is that she never learns. I tried many options:
- displaying the tic-tac-toe field as 9 inputs (0 = cell empty, 1 = player 1, 2 = player 2) or as 27 inputs (for example, for an empty cell 0 [empty = 1, player1 = 0, player2 = 0] )
- change the number of hidden nodes between 10 and 60
- 60k
- 0,001 0,1
- ,
: (
:
,
-Matthias