Lab 3

Introduction

In this lab, we will use a neural network to classify images. This lab is based on chapter 5.2 of “Deep Learning with PyTorch: Essential Excerpts”, and you may want to refer to it for more details if anything is unclear. You may also want to refer to this tutorial for the basics of defining Neural Networks in PyTorch.

Report

You are required to document your work in a report, that you should write while you work on the lab. Include all requested images, and any other graphs you deem interesting, and describe what you observe. The lab text will prompt you for specific information at times, but you are expected to fill in other text to produce a coherent document. At the end of the lab, send an email with the names and carnés of the students in the group as well as the zip file containing the lab report as a pdf, and all code you wrote to the two professors and all code you wrote to the two professors (markus.eger.ucr@gmail.com, marcela.alfarocordoba@ucr.ac.cr) with the subject “[PF-3115]Lab 3, carné 1, carné 2” before the start of class on 26/5. Do not include the data set in this zip file or email.

A more complex example

After familiarized yourself with the basics of Neural Networks in PyTorch it is time to move to a more interesting problem: Classifying handwritten digits. A classic data set for this is called the MNIST data set. To get started, download the MNIST data set (you will need all four files: the training data and labels, and the test data and labels). The website also contains a description of the file format used (idx). For the purposes of this lab, you can download code to read data from these files, as well as to display the images on the screen or save them as pngs here.

For this task, you will need a larger neural network (30-50 neurons in the hidden layer are a good starting point). Your input comes in the form of 28x28 pixel images, which means you will have 784 inputs. The hidden layer or layers will have to “compress” this down to 10 different outputs (“one-hot-encoding”). Try different number of hidden neurons and even layers and different activation functions, to see which one performs best. Generally, you will want to use Sigmoid or ReLU activation functions in the hidden layers and SoftMax in the output layer, but experiment with different options in the Hidden Layer only. For training, use the CrossEntropyLoss function provided by pytorch. Note that this function expects the output from your neural network, and an index for the desired class label. As our classes are digits from 0 to 9, we can use our class labels directly as these desired indices (i.e. the output of the first neuron in the output layer represents the probability that the digit is 0, the output of the second neuron represents the probability that the digit is 1, etc.).

To measure the actual performance after training, feed the test set to the neural network as input to calculate the performance metrics discussed in class (i.e. the loss is not the final/actual performance measure). This will tell you how well your network generalizes, rather than just measuring how much it memorizes. Use a confusion matrix to determine which digits are often mistaken for another, and calculate the per-class precision, recall and F1-scores, as well as their averages. Note that the recall for class i is the element (i,i) of the confusion matrix divided by the sum of the ith column (actual samples of class i), and the precision is the element (i,i) of the confusion matrix divided by the ith row (samples predicted to be of class i).

Once you have a performance you are happy with (you should be able to achieve at least 85% precision and recall on the test set, averaged over all digits), look at your neural network. For example, each neuron in the first layer has 784 inputs, one for each pixel. Likewise, it has 784 weights associated with it (+ a bias). Each of these weights describes how “important” each pixel is to that particular neuron. You can look at these weights and interpret them as an image (make sure to rescale it so that the values are between 0 and 255 using the appropriate parameter of show_image). This will result in an image that describes which part of the input each neuron is “responsible for”. Can you find any neurons with “obvious” responsibilities? What could that mean? Put any interesting pictures obtained from your weights into your report.

Note: For the weight images, you can expect something like the images shown below, which you may interpret as neurons responsible for detecting loops (zeros), and a combination of fives and nines. Not all neurons will have weights as clear as these, many will just be gray-ish blobs like the third image below. You can still identify a little black curve which this neuron identifies, but it is less clear (strong) than in the other two images.

Useful Resources