in order to highlight the problem, follow these tutorial .
theano has 3 ways to calculate the sigmoid tensor, namely sigmoid , ultra_fast_sigmoid and hard_sidmoid . Using the last two seems to break the gradient descent algorithm.
An ordinary sigmoid converges as it should, but others have strange inconsistent behavior. ultra_fast_sigmoid, just throws a direct error when trying to calculate the gradient "Method not defined (" grad ", ultra_fast_sigmoid), while hard_sigmoid compiles fine, but does not converge to the solution.
Does anyone know the source of this behavior? It does not stand out in the documentation that this should happen, and seems intuitive.
code:
import theano import theano.tensor as T import theano.tensor.nnet as nnet import numpy as np x = T.dvector() y = T.dscalar() def layer(x, w): b = np.array([1], dtype=theano.config.floatX) new_x = T.concatenate([x, b]) m = T.dot(wT, new_x)
I changed the following lines from the code to make a shorter conclusion for this message (they differ from the tutorial, but are already contained in the code above):
from theano.tensor.nnet import binary_crossentropy as cross_entropy
sigmoid
Cost: 1.62724279493 Cost: 0.545966632545 Cost: 0.156764560912 Cost: 0.0534911098234 Cost: 0.0280394147992 Cost: 0.0184933786794 Cost: 0.0136444190935 Cost: 0.0107482836159 0.993652087577 0.00848194143055 0.990829396285 0.00878482655791
ultra_fast_sigmoid
File "test.py", line 30, in <module> (theta1, grad_desc(fc, theta1)), File "test.py", line 19, in grad_desc return theta - (alpha * T.grad(cost, wrt=theta)) File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py", line 545, in grad grad_dict, wrt, cost_name) File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py", line 1283, in _populate_grad_dict rval = [access_grad_cache(elem) for elem in wrt] File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py", line 1241, in access_grad_cache term = access_term_cache(node)[idx] File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py", line 951, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py", line 1241, in access_grad_cache term = access_term_cache(node)[idx] File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py", line 951, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py", line 1241, in access_grad_cache term = access_term_cache(node)[idx] File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py", line 951, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py", line 1241, in access_grad_cache term = access_term_cache(node)[idx] File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py", line 951, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py", line 1241, in access_grad_cache term = access_term_cache(node)[idx] File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py", line 951, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py", line 1241, in access_grad_cache term = access_term_cache(node)[idx] File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py", line 1089, in access_term_cache input_grads = node.op.grad(inputs, new_output_grads) File "/usr/local/lib/python2.7/dist-packages/theano/tensor/elemwise.py", line 662, in grad rval = self._bgrad(inputs, ograds) File "/usr/local/lib/python2.7/dist-packages/theano/tensor/elemwise.py", line 737, in _bgrad scalar_igrads = self.scalar_op.grad(scalar_inputs, scalar_ograds) File "/usr/local/lib/python2.7/dist-packages/theano/scalar/basic.py", line 878, in grad self.__class__.__name__) theano.gof.utils.MethodNotDefined: ('grad', <class 'theano.tensor.nnet.sigm.UltraFastScalarSigmoid'>, 'UltraFastScalarSigmoid')
hard_sigmoid
Cost: 1.19810193303 Cost: 0.684360309062 Cost: 0.692614056124 Cost: 0.697902474354 Cost: 0.701540531661 Cost: 0.703807604483 Cost: 0.70470238116 Cost: 0.704385738831 0.4901260624 0.486248177053 0.489490785078 0.493368670425