Skip to content

Instantly share code, notes, and snippets.

Last active June 8, 2017 01:49
Show Gist options
  • Save yosemitebandit/8aec5677e69017bed04c to your computer and use it in GitHub Desktop.
Save yosemitebandit/8aec5677e69017bed04c to your computer and use it in GitHub Desktop.
udacity neural network course -- assignment 3.4, 3-layer NN with regularization and dropout
Display the source blob
Display the rendered blob
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Copy link

Why the number of steps was not reduced for in the answer of problem 2? The problem is asking to restrict the training data to just a few batches, but the num_stpes was kept as 3001. Could you pls clarify?

Copy link

The problem asks for restricting the data which is achieved using offset = batch_size * np.random.choice(np.arange(5))
Number of steps does not needs to be reduced to show the overfitting.

Copy link

gronat commented Nov 18, 2016

I am curious, why did you initialize from truncated normal with stddev = sqrt(2 / <input_size>)? Why not just truncated, say, stddev = 0.1 for all layers?

Copy link

help! when i run the following code, my loss function diverges, please can someone explain why?

batch_size = 128

#regularisation parameter
beta = 0.001

#2 hidden layers, neural network
hidden_nodes1 = 1024
hidden_nodes2 = 512

keep_prob = 0.5 #probability of drop out
initial_learning_rate = 0.5

graph = tf.Graph()
with graph.as_default():

Input data. For the training data, we use a placeholder that will be fed

at run time with a training minibatch.

tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)


hidden_weights1 = tf.Variable(
tf.truncated_normal([image_size * image_size, hidden_nodes1]))
hidden_biases1 = tf.Variable(tf.zeros([hidden_nodes1]))
hidden_layer1 = tf.nn.relu(tf.matmul(tf_train_dataset, hidden_weights1)
+ hidden_biases1)
hidden_layer_drop1 = tf.nn.dropout(hidden_layer1, keep_prob) #Dropout added

hidden_weights2 = tf.Variable(
tf.truncated_normal([hidden_nodes1, hidden_nodes2]))
hidden_biases2 = tf.Variable(tf.zeros([hidden_nodes2]))
hidden_layer2 = tf.nn.relu(tf.matmul(hidden_layer_drop1, hidden_weights2)
+ hidden_biases2)
hidden_layer_drop2 = tf.nn.dropout(hidden_layer2, keep_prob) #Dropout added

weights = tf.Variable(tf.truncated_normal([hidden_nodes2, num_labels]))
biases = tf.Variable(tf.zeros([num_labels]))

Training computation.

logits = tf.matmul(hidden_layer_drop2, weights) + biases
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels,
loss = loss + beta * tf.nn.l2_loss(weights)

Optimizer. Learning rate decreases with number of cycles

global_step = tf.Variable(0) # count the number of steps taken.
learning_rate = tf.train.exponential_decay(initial_learning_rate, global_step,
100000, 0.95, staircase=True)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,

Predictions for the training, validation, and test data.

train_prediction = tf.nn.softmax(logits)

valid_relu1 = tf.nn.relu(tf.matmul(tf_valid_dataset, hidden_weights1) + hidden_biases1)
valid_relu2 = tf.nn.relu(tf.matmul(valid_relu1, hidden_weights2) + hidden_biases2)
valid_prediction = tf.nn.softmax(tf.matmul(valid_relu2, weights) + biases)

test_relu1 = tf.nn.relu(tf.matmul(tf_test_dataset, hidden_weights1) + hidden_biases1)
test_relu2 = tf.nn.relu(tf.matmul(test_relu1, hidden_weights2) + hidden_biases2)
test_prediction = tf.nn.softmax(tf.matmul(test_relu2, weights) + biases)

Copy link

@zhuanquan I would assume somewhere where the losses are being computed incorrectly. If I drop your initial learning rate by an order of magnitude or more it begins to minimize. But it will not work for me either when I start at 0.5

Copy link

@zhuanquan I would suggest to initialize your weights' variables with standard deviation between 0.1 and 0.2 i.e, weights = tf.Variable([size], stddev=stdvalue)

Copy link

Why do you use np.random.choice(np.arange(5)) instead of just np.random.choice(5)? Just looking at the docs: and wondering what I am missing. Are they the same or slightly different? Or is this way easier for understanding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment