How to encode adagrad in python anano

To simplify the task, let's say when the size (or function) is already updated n times, the next time I see this function, I want to set the learning speed to 1 / n.

I came up with these codes:

def test_adagrad(): embedding = theano.shared(value=np.random.randn(20,10), borrow=True) times = theano.shared(value=np.ones((20,1))) lr = T.dscalar() index_a = T.lvector() hist = times[index_a] cost = T.sum(theano.sparse_grad(embedding[index_a])) gradients = T.grad(cost, embedding) updates = [(embedding, embedding+lr*(1.0/hist)*gradients)] ### Here should be some codes to update also times which are omitted ### train = theano.function(inputs=[index_a, lr],outputs=cost,updates=updates) for i in range(10): print train([1,2,3],0.05) 

Teano makes no mistake, but the result of training sometimes gives Nan. Does anyone know how to fix this please?

thanks for the help

PS: I doubt that these operations in a rarefied space create problems. So I tried replacing * with theano.sparse.mul. This gave some results, which I mentioned earlier.

+2
source share
3 answers

Perhaps you can use the following example to implement adadelta strong> and use it to get your own. Update if you succeeded :-)

+8
source

I searched for the same thing and eventually implemented it myself in the style of the zuuz resource that was already mentioned. Therefore, perhaps this helps anyone who is looking for help here.

 def adagrad(lr, tparams, grads, inp, cost): # stores the current grads gshared = [theano.shared(np.zeros_like(p.get_value(), dtype=theano.config.floatX), name='%s_grad' % k) for k, p in tparams.iteritems()] grads_updates = zip(gshared, grads) # stores the sum of all grads squared hist_gshared = [theano.shared(np.zeros_like(p.get_value(), dtype=theano.config.floatX), name='%s_grad' % k) for k, p in tparams.iteritems()] rgrads_updates = [(rg, rg + T.sqr(g)) for rg, g in zip(hist_gshared, grads)] # calculate cost and store grads f_grad_shared = theano.function(inp, cost, updates=grads_updates + rgrads_updates, on_unused_input='ignore') # apply actual update with the initial learning rate lr n = 1e-6 updates = [(p, p - (lr/(T.sqrt(rg) + n))*g) for p, g, rg in zip(tparams.values(), gshared, hist_gshared)] f_update = theano.function([lr], [], updates=updates, on_unused_input='ignore') return f_grad_shared, f_update 
+1
source

I find this Lasagna implementation very concise and readable. You can use it pretty much as it is:

 for param, grad in zip(params, grads): value = param.get_value(borrow=True) accu = theano.shared(np.zeros(value.shape, dtype=value.dtype), broadcastable=param.broadcastable) accu_new = accu + grad ** 2 updates[accu] = accu_new updates[param] = param - (learning_rate * grad / T.sqrt(accu_new + epsilon)) 
+1
source

Source: https://habr.com/ru/post/1239100/


All Articles