To simplify the task, let's say when the size (or function) is already updated n times, the next time I see this function, I want to set the learning speed to 1 / n.
I came up with these codes:
def test_adagrad(): embedding = theano.shared(value=np.random.randn(20,10), borrow=True) times = theano.shared(value=np.ones((20,1))) lr = T.dscalar() index_a = T.lvector() hist = times[index_a] cost = T.sum(theano.sparse_grad(embedding[index_a])) gradients = T.grad(cost, embedding) updates = [(embedding, embedding+lr*(1.0/hist)*gradients)]
Teano makes no mistake, but the result of training sometimes gives Nan. Does anyone know how to fix this please?
thanks for the help
PS: I doubt that these operations in a rarefied space create problems. So I tried replacing * with theano.sparse.mul. This gave some results, which I mentioned earlier.
source share