MNIST is the "Hello World" for Deep Learning. Here, I try to understand the basics of TensorFlow by following the tutorial for recognizing handwritten digits. The MNIST dataset is provided by the TensorFlow library itself. It contains black-and-white images of 28x28 size. Each image is of a handwritten digit, and the task is to recognize this digit.
import tensorflow as tf
import matplotlib.pyplot as plt
%matplotlib inline
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
sample = mnist.train.images[2].reshape(28, 28)
plt.imshow(sample, cmap = 'Greys')
So this is how each image looks in the dataset.
To build our model, we need to first define 3 parameters:
learning_rate = 0.001
training_epochs = 15
batch_size = 100
These parameters will define the neural network and can be adjusted based on the data and user preference.
n_hidden_1 = 256
n_hidden_2 = 256
n_input = 784
n_classes = 10
samples = mnist.train.num_examples
As in any statistical model, we have our inputs which are the dimensions we want to train our model with, and the labels for testing.
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
The model consists of 2 hidden layers. Our input will go to the first hidden layer, where the data will get some weight attached to it (at first, randomly). The data will then be passed through an activation function (here, RELU activation), and over to the next hidden layer. From the second hidden layer, the data reaches the output layer, where we will determine the overall loss. We will then apply an optimization function, which will adjust the weights across the network to lower the loss in the subsequent run. Based on the tutorial, here we use the Adam Optimizer.
def multilayer_perceptron(x, weights, biases):
# First Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
# Second Hidden layer with RELU activation
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.relu(layer_2)
# Last Output layer with linear activation
out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
return out_layer
The weights and biases are randomly generated and stored as TensorFlow variables. These values will be modified by the activation and the optimization function.
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
pred = multilayer_perceptron(x, weights, biases)
Calculate the cost of each run by measuring the mean entropy i.e. difference between the predicted value and actual value for each image.
For optimization, we use the built-in Adam Optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = pred, labels = y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
init = tf.global_variables_initializer()
# Launch the session
sess = tf.InteractiveSession()
# Intialize all the variables
sess.run(init)
# Training Epochs
# Essentially the max amount of loops possible before we stop
# May stop earlier if cost/loss limit was set
for epoch in range(training_epochs):
# Start with cost = 0.0
avg_cost = 0.0
# Convert total number of batches to integer
total_batch = int(samples/batch_size)
# Loop over all batches
for i in range(total_batch):
# Grab the next batch of training data and labels
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Feed dictionary for optimization and loss value
# Returns a tuple, but we only need 'c' the cost
# So we set an underscore as a "throwaway"
_, c = sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})
# Compute average loss
avg_cost += c / total_batch
print("Epoch: {} cost={:.4f}".format(epoch+1,avg_cost))
print("Model has completed {} Epochs of Training".format(training_epochs))
We evaluate the performance of our model by calculating how many images were recognized correctly out of the total images.
correct_predictions = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
correct_predictions = tf.cast(correct_predictions, "float")
accuracy = tf.reduce_mean(correct_predictions)
print("Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))
The accuracy can be increased by increasing the number of hidden layers and the number of training epochs.