Tic tac toe Training - Actual Activities

Question

Tic tac toe Training - Actual Activities

I play with machine learning. Especially Q-Learning, where you have status and actions, as well as rewards, depending on how well the network works.

Now, for beginners, I set myself a simple goal: to train the network so that it gives out valid moves for tic-tac-toe (against a random opponent) as actions. My problem is that the network does not learn at all or even worsens over time.

The first thing I did was contact the torch and the deep training module for it: https://github.com/blakeMilner/DeepQLearning .

Then I wrote a simple tic-tac-toe game where a random player competes with a neural network and connected it to the code from this example https://github.com/blakeMilner/DeepQLearning/blob/master/test.lua . The output signal of the network consists of 9 nodes for installing the corresponding cell.

A move is permissible if the network selects an empty cell (it does not have X or O). In accordance with this, I give a positive reward (if the network selects an empty cell) and negative rewards (if the network selects an occupied cell).

The problem is that she never learns. I tried many options:

displaying the tic-tac-toe field as 9 inputs (0 = cell empty, 1 = player 1, 2 = player 2) or as 27 inputs (for example, for an empty cell 0 [empty = 1, player1 = 0, player2 = 0] )
change the number of hidden nodes between 10 and 60
60k
0,001 0,1
,

: (

:

Q-Learning, -, ?
? "" : https://github.com/blakeMilner/DeepQLearning/blob/master/deepqlearn.lua#L57.
?
, https://github.com/blakeMilner/DeepQLearning/blob/master/deepqlearn.lua#L116 ?
?

,

-Matthias

+4

deep-learning machine-learning q-learning torch tic-tac-toe

nitrogenycs 31 . '16 22:26

1

islandman93 · Answer 1 · 2016-02-12T00:28:03+0000

,

, node? " 1 9". , , . , node, , . argmax . , Go ( 361 , ).

, !

Tic tac toe Training - Actual Activities

More articles: