How to implement a multidimensional linear stochastic gradient descent algorithm in a tensor flow?

I started with a simple implementation of single-linear linear gradient descent, but I don’t know to extend it to the multidimensional stochastic gradient descent algorithm?

Single variable linear regression

import tensorflow as tf import numpy as np # create random data x_data = np.random.rand(100).astype(np.float32) y_data = x_data * 0.5 # Find values for W that compute y_data = W * x_data W = tf.Variable(tf.random_uniform([1], -1.0, 1.0)) y = W * x_data # Minimize the mean squared errors. loss = tf.reduce_mean(tf.square(y - y_data)) optimizer = tf.train.GradientDescentOptimizer(0.01) train = optimizer.minimize(loss) # Before starting, initialize the variables init = tf.initialize_all_variables() # Launch the graph. sess = tf.Session() sess.run(init) # Fit the line. for step in xrange(2001): sess.run(train) if step % 200 == 0: print(step, sess.run(W)) 
+5
source share
1 answer

You have two parts in your question:

  • How to change this problem to a higher dimension space.
  • How to move from the descent of the party gradient to stochastic gradient descent.

To get a higher size setting, you can define your linear task y = <x, w> . Then you just need to resize your W variable so that it matches the W value and replace the multiplication W*x_data with the scalar product tf.matmul(x_data, W) , and your code should work fine.

To change the learning method to stochastic gradient descent, you need to abstract the input of the cost function using tf.placeholder .
Once you have defined X and y_ to hold the input at each step, you can build the same cost function. Then you need to call your step by submitting the correct mini-batch of your data.

Here is an example of how you can implement this behavior, and should show that W quickly converges to W

 import tensorflow as tf import numpy as np # Define dimensions d = 10 # Size of the parameter space N = 1000 # Number of data sample # create random data w = .5*np.ones(d) x_data = np.random.random((N, d)).astype(np.float32) y_data = x_data.dot(w).reshape((-1, 1)) # Define placeholders to feed mini_batches X = tf.placeholder(tf.float32, shape=[None, d], name='X') y_ = tf.placeholder(tf.float32, shape=[None, 1], name='y') # Find values for W that compute y_data = <x, W> W = tf.Variable(tf.random_uniform([d, 1], -1.0, 1.0)) y = tf.matmul(X, W, name='y_pred') # Minimize the mean squared errors. loss = tf.reduce_mean(tf.square(y_ - y)) optimizer = tf.train.GradientDescentOptimizer(0.01) train = optimizer.minimize(loss) # Before starting, initialize the variables init = tf.initialize_all_variables() # Launch the graph. sess = tf.Session() sess.run(init) # Fit the line. mini_batch_size = 100 n_batch = N // mini_batch_size + (N % mini_batch_size != 0) for step in range(2001): i_batch = (step % n_batch)*mini_batch_size batch = x_data[i_batch:i_batch+mini_batch_size], y_data[i_batch:i_batch+mini_batch_size] sess.run(train, feed_dict={X: batch[0], y_: batch[1]}) if step % 200 == 0: print(step, sess.run(W)) 

Two side notes:

  • The implementation below is called a mini-batch gradient, as at each step, the gradient is calculated using a subset of our mini_batch_size data. This is a variant of stochastic gradient descent, which is usually used to stabilize the gradient estimate at each step. Stochastic gradient descent can be obtained by setting mini_batch_size = 1 .

  • The data set can be shuffled in each era to bring the implementation closer to theoretical considerations. Some recent works also only look at one pass through your dataset, as it prevents over-installation. For a more detailed mathematical description, you can see Bottou12 . This can be easily changed according to your problem setting and the statistical property you are looking for.

+7
source

Source: https://habr.com/ru/post/1245144/


All Articles