TensorFlow weights increase when using the full gradient descent dataset

Question

TensorFlow weights increase when using the full gradient descent dataset

I wrote an article explaining in detail how a neural network works from scratch.

To illustrate a blog post, I wrote a neural network in python using numpy , and I wrote a version using TensorFlow . I uploaded the code to Github to illustrate this question, but this is not a clean version.

The goal of the network is to predict the price of a car based on its three characteristics (km, fuel type, age). This is an example of a toy that I created from scratch.

I extracted data from leboncoin.fr , my data set consists of about 9 thousand cars (only for BMW 1 series). I normalized the data so that the price was between [0, 1], the type of fuel was binary encoded, and the age and number of kilometers were normalized using the mean and standard deviation.

The architecture of the neural network is very simple, and I use only three attributes of the car, however, the results of my non tensorflow network are pretty good. A set of test checks gives:

### Testing summary ###
Iteration: 2000, Loss 0.001066
RMSE: 0.0567967802161
MAE: 0.00757498877216
R2: 0.198448957215

I use the entire data set while optimizing gradient descent. My problem appears in the TensorFlow version, if I use only 20 inputs when the gradient converges, the losses will decrease accordingly:

I tensorflow/core/kernels/logging_ops.cc:79] loss[0.6057564]
I tensorflow/core/kernels/logging_ops.cc:79] loss[0.45724705]
I tensorflow/core/kernels/logging_ops.cc:79] loss[0.35986084]
I tensorflow/core/kernels/logging_ops.cc:79] loss[0.29016402]
I tensorflow/core/kernels/logging_ops.cc:79] loss[0.23823617]
I tensorflow/core/kernels/logging_ops.cc:79] loss[0.1986042]
I tensorflow/core/kernels/logging_ops.cc:79] loss[0.16779649]
I tensorflow/core/kernels/logging_ops.cc:79] loss[0.14347225]
I tensorflow/core/kernels/logging_ops.cc:79] loss[0.12400422]
I tensorflow/core/kernels/logging_ops.cc:79] loss[0.10823684]

But if I use the entire data set, i.e. 9k examples, my loss shows erratic behavior.

I tensorflow/core/kernels/logging_ops.cc:79] loss[226.40295]
I tensorflow/core/kernels/logging_ops.cc:79] loss[6130.1694]
I tensorflow/core/kernels/logging_ops.cc:79] loss[8629.668]
I tensorflow/core/kernels/logging_ops.cc:79] loss[9219.1445]
I tensorflow/core/kernels/logging_ops.cc:79] loss[9217.1855]
I tensorflow/core/kernels/logging_ops.cc:79] loss[9211.8428]
I tensorflow/core/kernels/logging_ops.cc:79] loss[9209.2715]
I tensorflow/core/kernels/logging_ops.cc:79] loss[9212.22]
I tensorflow/core/kernels/logging_ops.cc:79] loss[9204.3613]
I tensorflow/core/kernels/logging_ops.cc:79] loss[9182.3125]
I tensorflow/core/kernels/logging_ops.cc:79] loss[9171.9746]
I tensorflow/core/kernels/logging_ops.cc:79] loss[9200.2207]

I do not understand why.

tensorflow :

import csv
import numpy as np
import tensorflow as tf

reader = csv.reader(open("normalized_car_features.csv", "rb"), delimiter=",")
x = list(reader)
features = np.array(x[1:]).astype("float")
np.random.shuffle(features)

data_x = features[:, :3]
data_y = features[:, 3:]

m = float(features.shape[0])
threshold = int(m * 0.8)

x_data, x_test = data_x[:threshold, :], data_x[threshold:, :]
y_data, y_test = data_y[:threshold, :], data_y[threshold:, :]

x = tf.placeholder("float")
y = tf.placeholder("float")

w1 = np.matrix([
    [0.01, 0.05, 0.07],
    [0.2, 0.041, 0.11],
    [0.04, 0.56, 0.13]
])

w2 = np.matrix([
    [0.04, 0.78],
    [0.4, 0.45],
    [0.65, 0.23]
])

w3 = np.matrix([
    [0.04],
    [0.41]
])

w1 = tf.Variable(w1, dtype=tf.float32)
w2 = tf.Variable(w2, dtype=tf.float32)
w3 = tf.Variable(w3, dtype=tf.float32)

b1 = tf.Variable(np.matrix([0.1, 0.1, 0.1]), dtype=tf.float32)
b2 = tf.Variable(np.matrix([0.1, 0.1]), dtype=tf.float32)
b3 = tf.Variable(np.matrix([0.1]), dtype=tf.float32)

layer_1 = tf.nn.tanh(tf.add(tf.matmul(x, w1), b1))
layer_2 = tf.nn.tanh(tf.add(tf.matmul(layer_1, w2), b2))
layer_3 = tf.nn.tanh(tf.add(tf.matmul(layer_2, w3),  b3))

loss = tf.reduce_sum(tf.square(layer_3 - y))
loss = tf.Print(loss, [loss], "loss")

train_op = tf.train.GradientDescentOptimizer(1/m * 0.01).minimize(loss)

init = tf.global_variables_initializer()

with tf.Session() as session:
    session.run(init)
    for i in range(10000):
        session.run(train_op, feed_dict={x: x_data, y: y_data})

[-1, -1, ..., -1, -1].

UPDATE: tf.train.GradientDescentOptimizer(1/m * 0.01) .

+4

python numpy tensorflow

Florian Courtial 26 . '17 9:30

1

Micael Carvalho · Accepted Answer · 2017-03-27T08:56:43+0000

, . , . , L2, :

l_value = tf.pow(tf.abs(ground_truth - predict), 2) # distance for each individual position of the output matrix of shape = (n_examples, example_data_size)
regression_loss = tf.reduce_sum(l_value, axis=1) # distance per example, shape = (n_examples, 1)
total_regression_loss = tf.reduce_mean(regression_loss) # mean distance of all examples, shape = (1)

PS: tf.abs , L2 (, L1), , .

TensorFlow weights increase when using the full gradient descent dataset

More articles: