hedeya1980 commented Oct 16, 2016

Why the number of steps was not reduced for in the answer of problem 2? The problem is asking to restrict the training data to just a few batches, but the num_stpes was kept as 3001. Could you pls clarify?

aakashef commented Nov 18, 2016

The problem asks for restricting the data which is achieved using offset = batch_size * np.random.choice(np.arange(5))
Number of steps does not needs to be reduced to show the overfitting.

gronat commented Nov 18, 2016 •

edited

Loading

I am curious, why did you initialize from truncated normal with stddev = sqrt(2 / <input_size>)? Why not just truncated, say, stddev = 0.1 for all layers?

zhuanquan commented Feb 20, 2017

help! when i run the following code, my loss function diverges, please can someone explain why?

batch_size = 128

#regularisation parameter
beta = 0.001

#2 hidden layers, neural network
hidden_nodes1 = 1024
hidden_nodes2 = 512

keep_prob = 0.5 #probability of drop out
initial_learning_rate = 0.5

graph = tf.Graph()
with graph.as_default():

Input data. For the training data, we use a placeholder that will be fed

at run time with a training minibatch.

tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)

Variables.

hidden_weights1 = tf.Variable(
tf.truncated_normal([image_size * image_size, hidden_nodes1]))
hidden_biases1 = tf.Variable(tf.zeros([hidden_nodes1]))
hidden_layer1 = tf.nn.relu(tf.matmul(tf_train_dataset, hidden_weights1)
+ hidden_biases1)
hidden_layer_drop1 = tf.nn.dropout(hidden_layer1, keep_prob) #Dropout added

hidden_weights2 = tf.Variable(
tf.truncated_normal([hidden_nodes1, hidden_nodes2]))
hidden_biases2 = tf.Variable(tf.zeros([hidden_nodes2]))
hidden_layer2 = tf.nn.relu(tf.matmul(hidden_layer_drop1, hidden_weights2)
+ hidden_biases2)
hidden_layer_drop2 = tf.nn.dropout(hidden_layer2, keep_prob) #Dropout added

weights = tf.Variable(tf.truncated_normal([hidden_nodes2, num_labels]))
biases = tf.Variable(tf.zeros([num_labels]))

Training computation.

logits = tf.matmul(hidden_layer_drop2, weights) + biases
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels,
logits=logits))
loss = loss + beta * tf.nn.l2_loss(weights)

Optimizer. Learning rate decreases with number of cycles

global_step = tf.Variable(0) # count the number of steps taken.
learning_rate = tf.train.exponential_decay(initial_learning_rate, global_step,
100000, 0.95, staircase=True)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,
global_step=global_step)

Predictions for the training, validation, and test data.

train_prediction = tf.nn.softmax(logits)

valid_relu1 = tf.nn.relu(tf.matmul(tf_valid_dataset, hidden_weights1) + hidden_biases1)
valid_relu2 = tf.nn.relu(tf.matmul(valid_relu1, hidden_weights2) + hidden_biases2)
valid_prediction = tf.nn.softmax(tf.matmul(valid_relu2, weights) + biases)

test_relu1 = tf.nn.relu(tf.matmul(tf_test_dataset, hidden_weights1) + hidden_biases1)
test_relu2 = tf.nn.relu(tf.matmul(test_relu1, hidden_weights2) + hidden_biases2)
test_prediction = tf.nn.softmax(tf.matmul(test_relu2, weights) + biases)

cipher982 commented Feb 22, 2017

@zhuanquan I would assume somewhere where the losses are being computed incorrectly. If I drop your initial learning rate by an order of magnitude or more it begins to minimize. But it will not work for me either when I start at 0.5

sahibzada-irfanullah commented Apr 29, 2017

@zhuanquan I would suggest to initialize your weights' variables with standard deviation between 0.1 and 0.2 i.e, weights = tf.Variable([size], stddev=stdvalue)

ashleylid commented May 23, 2017

Why do you use np.random.choice(np.arange(5)) instead of just np.random.choice(5)? Just looking at the docs: https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.choice.html and wondering what I am missing. Are they the same or slightly different? Or is this way easier for understanding?

yosemitebandit/3_regularization.ipynb