Linear Regression with Tensor Flow

Question

Linear Regression with Tensor Flow

I am trying to understand linear regression ... here is the script I was trying to understand:

''' A linear regression learning algorithm example using TensorFlow library. Author: Aymeric Damien Project: https://github.com/aymericdamien/TensorFlow-Examples/ ''' from __future__ import print_function import tensorflow as tf from numpy import * import numpy import matplotlib.pyplot as plt rng = numpy.random # Parameters learning_rate = 0.0001 training_epochs = 1000 display_step = 50 # Training Data train_X = numpy.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167, 7.042,10.791,5.313,7.997,5.654,9.27,3.1]) train_Y = numpy.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221, 2.827,3.465,1.65,2.904,2.42,2.94,1.3]) train_X=numpy.asarray(train_X) train_Y=numpy.asarray(train_Y) n_samples = train_X.shape[0] # tf Graph Input X = tf.placeholder("float") Y = tf.placeholder("float") # Set model weights W = tf.Variable(rng.randn(), name="weight") b = tf.Variable(rng.randn(), name="bias") # Construct a linear model pred = tf.add(tf.multiply(X, W), b) # Mean squared error cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples) # Gradient descent optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost) # Initializing the variables init = tf.global_variables_initializer() # Launch the graph with tf.Session() as sess: sess.run(init) # Fit all training data for epoch in range(training_epochs): for (x, y) in zip(train_X, train_Y): sess.run(optimizer, feed_dict={X: x, Y: y}) # Display logs per epoch step if (epoch+1) % display_step == 0: c = sess.run(cost, feed_dict={X: train_X, Y:train_Y}) print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \ "W=", sess.run(W), "b=", sess.run(b)) print("Optimization Finished!") training_cost = sess.run(cost, feed_dict={X: train_X, Y: train_Y}) print("Training cost=", training_cost, "W=", sess.run(W), "b=", sess.run(b), '\n') # Graphic display plt.plot(train_X, train_Y, 'ro', label='Original data') plt.plot(train_X, sess.run(W) * train_X + sess.run(b), label='Fitted line') plt.legend() plt.show()

The question is what this part represents:

 # Set model weights W = tf.Variable(rng.randn(), name="weight") b = tf.Variable(rng.randn(), name="bias")

And why are there random floating point numbers?

Also could you show me that math with forms is a variable of value, pred, optimizer?

+3

python tensorflow linear-regression prediction

Vladimir Djukic Apr 2 '17 at 15:19

source share

3 answers

Variables allow us to add learning parameters to the chart. They are built with type and initial value:

 W = tf.Variable([.3], tf.float32) b = tf.Variable([-.3], tf.float32) x = tf.placeholder(tf.float32) linear_model = W * x + b

A variable of type tf.Variable is a parameter that we recognize using TensorFlow. Suppose you use gradient descent to minimize the loss function. First you need to set this parameter first. rng.randn() used to create a random value for this purpose.

I think getting started with TensorFlow is a good starting point for you.

0

stamaimer Apr 2 '17 at 15:35

source share

First I define the variables:

 W is a multidimensional line that spans R^d (same dimensionality as X) b is a scalar value (bias) Y is also a scalar value ie the value at X pred = W (dot) X + b # dot here refers to dot product # cost equals the average squared error cost = ((pred - Y)^2) / 2*num_samples #finally optimizer # optimizer computes the gradient with respect to each variable and the update W += learning_rate * (pred - Y)/num_samples * X b += learning_rate * (pred - Y)/num_samples

Why W and b are given randomly is an update based on gradients from the error calculated from the cost, so W and b could be initialized to anything. It does not perform linear least squares regression, although both will converge to the same solution.

See more information: Getting Started

0

Steven Apr 2 '17 at 15:41

source share

fr_andres · Accepted Answer · 2017-04-02T16:18:22+0000

try to put some sources of intuition and along with the tf approach.

General intuition:

The regression presented here is a controlled learning problem. In it, as defined by Russel & Norvig Artificial Intelligence , the task is this:

given a set of workouts (X, y) of output pairs m (x1, y1), (x2, y2), ... , (xm, ym) , where each output was generated by an unknown function y = f(x) , find the function h which approximates the true function f

For this, the hypothesis function h combines somehow each x with the parameters to be studied, so that the maximum possible output corresponding to the maximum possible y , and this is for the entire data set. We hope that the resulting function will be close to f .

But how do you know these parameters? to find out , the model must evaluate ., Here comes the function (also called loss, energy, merit ...) for the function: this is a metric function that compares the output of h with the corresponding y , but punishes the big differences .

Now it should be clear what the “learning process" is here: change the parameters to get a low value for the cost function .

Linear Regression:

The example you publish performs a parametric linear regression optimized with gradient descents based on the mean square error as a function of cost. It means:

Parametric : parameter set corrected. They are stored in the same places of memory that carefully study the learning process.
Linear The output of h is just a linear (actually, affine) combination between the input x and your parameters. Therefore, if x and w are real-valued vectors of the same dimension, and b is a real number, then this means that h(x,w, b)= w.transposed()*x+b . Page 107 of the Deep Learning Book provides better insights and insights.
Cost function . Now this is the interesting part. The root mean square error is a convex function. This means that it has one global optimum and, in addition, it can be found directly using a set of normal equations (also explained in DLB). In the case of your example, the stochastic (or / or mini-receiving) gradient descent method is used: this is the preferred method for optimizing non-convex cost functions (which occurs in more advanced models, such as neural networks) or when your data set has a large dimension (also explained in DLB).
Gradient descent : tf deals with this for you, so suffice it to say that GD minimizes the cost function by following its “down” derivative in small steps, while reaching the saddle point. If you absolutely need to know, the exact technique used by TF is called automatic differentiation , a kind of compromise between numerical and symbolic approaches. For convex functions like yours, this item will be global optimal, and (if the learning speed is not too high), it will always converge to it, so it does not matter what values you initialize with your variables using . Random initialization is needed in more complex architectures, such as neural networks. There is some additional code regarding the management of mini-compartments, but I will not go into this because this is not the main focus of your question.

TensorFlow Approach:

The fundamentals of deep learning are currently devoted to many functions by building computational graphs (you can take a look at the presentation on DL infrastructures that I did a few weeks ago). To build and run a TensoFlow graph, a declarative style follows, which means that the graph must first be fully defined and compiled before it is deployed and executed. It is highly recommended that you read this short wiki article if you have not already done so. In this context, the setup is divided into two parts:

First, you define your computational Graph , where you put your data set and parameters in memory placeholders, determine the hypothesis and cost functions based on them, and tell tf which optimization method to use.
Then you run the calculation in Session , and the library can (re) load the data placeholders and perform the optimization.

Code:

Sample code follows this approach:

Define the test data x and stickers y , and prepare for them a placeholder in the Graph (which is provided in the feed_dict part).
Define the 'W' and 'b' placeholders for the parameters. They must be Variables because they will be updated during the session.
Define pred (our hypothesis) and cost , as explained earlier.

From this, the rest of the code should be clearer. As for the optimizer, as I said, tf already knows how to deal with this, but you might want to look at the gradient descent for more details (again, DLB is a pretty good link for this)

Hooray! Andres

CODE EXAMPLES: GRADIENT SPACE VS. NORMAL EQUATIONS

These small fragments generate simple multidimensional data sets and test both approaches. Note that the normal equation approach does not require looping and gives better results. For small dimensions (DIMENSIONS <30k), the preferred approach is probably:

 from __future__ import absolute_import, division, print_function import numpy as np import tensorflow as tf #################################################################################################### ### GLOBALS #################################################################################################### DIMENSIONS = 5 f = lambda(x): sum(x) # the "true" function: f = 0 + 1*x1 + 1*x2 + 1*x3 ... noise = lambda: np.random.normal(0,10) # some noise #################################################################################################### ### GRADIENT DESCENT APPROACH #################################################################################################### # dataset globals DS_SIZE = 5000 TRAIN_RATIO = 0.6 # 60% of the dataset is used for training _train_size = int(DS_SIZE*TRAIN_RATIO) _test_size = DS_SIZE - _train_size ALPHA = 1e-8 # learning rate LAMBDA = 0.5 # L2 regularization factor TRAINING_STEPS = 1000 # generate the dataset, the labels and split into train/test ds = [[np.random.rand()*1000 for d in range(DIMENSIONS)] for _ in range(DS_SIZE)] # synthesize data # ds = normalize_data(ds) ds = [(x, [f(x)+noise()]) for x in ds] # add labels np.random.shuffle(ds) train_data, train_labels = zip(*ds[0:_train_size]) test_data, test_labels = zip(*ds[_train_size:]) # define the computational graph graph = tf.Graph() with graph.as_default(): # declare graph inputs x_train = tf.placeholder(tf.float32, shape=(_train_size, DIMENSIONS)) y_train = tf.placeholder(tf.float32, shape=(_train_size, 1)) x_test = tf.placeholder(tf.float32, shape=(_test_size, DIMENSIONS)) y_test = tf.placeholder(tf.float32, shape=(_test_size, 1)) theta = tf.Variable([[0.0] for _ in range(DIMENSIONS)]) theta_0 = tf.Variable([[0.0]]) # don't forget the bias term! # forward propagation train_prediction = tf.matmul(x_train, theta)+theta_0 test_prediction = tf.matmul(x_test, theta) +theta_0 # cost function and optimizer train_cost = (tf.nn.l2_loss(train_prediction - y_train)+LAMBDA*tf.nn.l2_loss(theta))/float(_train_size) optimizer = tf.train.GradientDescentOptimizer(ALPHA).minimize(train_cost) # test results test_cost = (tf.nn.l2_loss(test_prediction - y_test)+LAMBDA*tf.nn.l2_loss(theta))/float(_test_size) # run the computation with tf.Session(graph=graph) as s: tf.initialize_all_variables().run() print("initialized"); print(theta.eval()) for step in range(TRAINING_STEPS): _, train_c, test_c = s.run([optimizer, train_cost, test_cost], feed_dict={x_train: train_data, y_train: train_labels, x_test: test_data, y_test: test_labels }) if (step%100==0): # it should return bias close to zero and parameters all close to 1 (see definition of f) print("\nAfter", step, "iterations:") #print(" Bias =", theta_0.eval(), ", Weights = ", theta.eval()) print(" train cost =", train_c); print(" test cost =", test_c) PARAMETERS_GRADDESC = tf.concat(0, [theta_0, theta]).eval() print("Solution for parameters:\n", PARAMETERS_GRADDESC) #################################################################################################### ### NORMAL EQUATIONS APPROACH #################################################################################################### # dataset globals DIMENSIONS = 5 DS_SIZE = 5000 TRAIN_RATIO = 0.6 # 60% of the dataset isused for training _train_size = int(DS_SIZE*TRAIN_RATIO) _test_size = DS_SIZE - _train_size f = lambda(x): sum(x) # the "true" function: f = 0 + 1*x1 + 1*x2 + 1*x3 ... noise = lambda: np.random.normal(0,10) # some noise # training globals LAMBDA = 1e6 # L2 regularization factor # generate the dataset, the labels and split into train/test ds = [[np.random.rand()*1000 for d in range(DIMENSIONS)] for _ in range(DS_SIZE)] ds = [([1]+x, [f(x)+noise()]) for x in ds] # add x[0]=1 dimension and labels np.random.shuffle(ds) train_data, train_labels = zip(*ds[0:_train_size]) test_data, test_labels = zip(*ds[_train_size:]) # define the computational graph graph = tf.Graph() with graph.as_default(): # declare graph inputs x_train = tf.placeholder(tf.float32, shape=(_train_size, DIMENSIONS+1)) y_train = tf.placeholder(tf.float32, shape=(_train_size, 1)) theta = tf.Variable([[0.0] for _ in range(DIMENSIONS+1)]) # implicit bias! # optimum optimum = tf.matrix_solve_ls(x_train, y_train, LAMBDA, fast=True) # run the computation: no loop needed! with tf.Session(graph=graph) as s: tf.initialize_all_variables().run() print("initialized") opt = s.run(optimum, feed_dict={x_train:train_data, y_train:train_labels}) PARAMETERS_NORMEQ = opt print("Solution for parameters:\n",PARAMETERS_NORMEQ) #################################################################################################### ### PREDICTION AND ERROR RATE #################################################################################################### # generate test dataset ds = [[np.random.rand()*1000 for d in range(DIMENSIONS)] for _ in range(DS_SIZE)] ds = [([1]+x, [f(x)+noise()]) for x in ds] # add x[0]=1 dimension and labels test_data, test_labels = zip(*ds) # define hypothesis h_gd = lambda(x): PARAMETERS_GRADDESC.T.dot(x) h_ne = lambda(x): PARAMETERS_NORMEQ.T.dot(x) # define cost mse = lambda pred, lab: ((pred-np.array(lab))**2).sum()/DS_SIZE # make predictions! predictions_gd = np.array([h_gd(x) for x in test_data]) predictions_ne = np.array([h_ne(x) for x in test_data]) # calculate and print total error cost_gd = mse(predictions_gd, test_labels) cost_ne = mse(predictions_ne, test_labels) print("total cost with gradient descent:", cost_gd) print("total cost with normal equations:", cost_ne)

Linear Regression with Tensor Flow

General intuition:

Linear Regression:

TensorFlow Approach:

Code:

CODE EXAMPLES: GRADIENT SPACE VS. NORMAL EQUATIONS

More articles: