Lab 4
- Introduction
- Image Classification
- Generative Adversarial Networks
- Training the GAN
- Report
- Submission
- Bonus Points
- Useful Resources
Lab 4: Generative Adversarial Networks, deadline: May 11
Introduction
In this lab, we will use a neural network to classify images, and then generate more images of the same type. Refer to chapter 6.2 of “Deep Learning with PyTorch” for more details if anything is unclear. You may also want to refer to this tutorial for the basics of defining Neural Networks in PyTorch.
The general idea of the lab is split into two tasks: First, you will build a classifier, that is able to identify one particular, handwritten digit (for example 2s) to learn the basics about Neural Networks in PyTorch, and how to use them. In the second task, you will build a second neural network that takes random numbers as its input and uses them to produce a new image of a 2. Each of these neural networks is trained using a target to calculate a loss. For the classifier, the loss is defined by how many digits it classifies wrongly (2s that are not identified as such, and other digits that are classified as 2s), while the generator uses the classifier to calculate its loss: The more generated images the classifier identifies as “fake”, the higher the loss of the generator. The two networks will therefore play a “game”: The classifier (also called “discriminator”) will become better and better at identifying “real” 2s, while the generator will become better and better at deceiving the discriminator.
There are two ways to do this lab:
- You can work with your local python installation. For this, install pytorch as described on their website (if you have doubts, use “Stable”, your OS, “pip”, and “Cuda 10.2”). You will need a 64bit installation of Python 3! Once you have installed torch, download this python file to get started. It will download and load the data set (note: The download is about 10MB in size, and it will uncompress to about 100MB), show you some basics of pytorch and provides a function to show or store images, which will be useful in this lab. The file also has clearly marked and numbered “TODO” sections for the parts you have to change, and the lab text will refer to these TODOs!
- You can work using this google Colab Notebook. Copy it to your own drive, and work there. The notebook consists of the same code as the python file mentioned above, including the TODOs. For your convenience, it has also been split into cells and sections, indicating which sections have TODOs for you. Important: Before you submit, go to Runtime - Run All (Ctrl+F9) to make sure that your code works when all cells are run in order. Then go to File - Download - Download .py. Only .py files will be accepted as a valid submission, and not .ipynb files!
The first time you run the code (or when you delete the MNIST folder), the code will automatically download the data set. Do not worry if you see a few messages with “HTTP Error 503: Service Unavailable”, the download process will try several mirrors until one is available. However, until you have chosen a digit (TODO 1), you will then get an “AttributeError”.
Image Classification
For the first part of this lab, you will construct a neural network to identify one particular digit. First, choose a digit; it can be your birth day or month (if that’s a single digit), last digit of your BroncoID, your favorite number, etc. (TODO 1)
To train your neural network, you will use the tensor x_train
as the training examples, and the tensor y_train
as the desired output of the network: y_train
will be 1
for each
image that is of your chosen digit, and 0
for all others (labels_train
tells us which image is of which digit). y_validation
is created in the same fashion for the validation set. The code already does this (in main
)
Now construct a neural network for this task. Your input comes in the form of 28x28 pixel images, which means you will have 784 inputs. The hidden layer or layers will have to “compress” this down to a single output,
representing whether the input is an image of your chosen digit or not. Start with a LeakyReLU
activation function in your hidden layer with 256 neurons and a Sigmoid
activation function in your output layer (TODO 3).
Once you have your neural network, you need to train it. First, observe that you can pass more than one input into your network, and it will produce one output for each input. For example, if you have 100 images, you can
pass a matrix with 100 rows (number of images) and 784 columns (number of pixels in each image) to the network, and it will produce a matrix with 100 rows, and 1 column (because you have one output for each input image).
Note: The input and output will always be a matrix. Even if you only want to pass one image through the neural network, you have to pass it a matrix with 1 row and 784 columns and you will get a matrix with 1 row
and 1 column as result. When in doubt, check the shape
attribute of your input tensor: It has to have two dimensions.
For the actual training, you will use a loop. Implement the function train_classifier
(TODO 4), as follows:
for i in range(n0):
# 1. reset gradients (from previous iteration) to zero
# 2. Pass training examples through network
# 3. Calculate the loss
# 4. Calculate the gradient by calling loss.backward()
# 5. Perform an optimizer step
Start with about 10 iterations, and increase when you think your code works (TODO 2) to improve its performance.
In order to use this loop, you will need an optimizer opt
; Use torch.optim.Adam
(see the documentation for details on
how to use an optimizer). Adam needs a starting learning rate, but it will automatically adapt it over time, so you can just start with something like 0.01
, and not worry too much about it (later you can tweak
this value). Implement the function classify
(TODO 5):
- Instantiate the Neural Network (
Discriminator
) and the optimizer - Call
train_classifier
on the training data - Apply this classifier to the validation set and calculate how many of your digit were classified as such (true positives, TP), how many were not (false negatives, FN), and for each of the other digits individually and in sum, how many were classified as your digit (false positives, FP) and how many were not (true negatives, TN). Report the FP and TN values for each digit individually. Save some of the misclassified images to files!
- Calculate the performance of the trained classifier on the validation set using the sum of the FP and TN values for the individual digits:
- Accuracy: Which percentage of all digits was classified correctly, as your digit or not, i.e. (TP + TN)/(TP + TN + FP + FN)
- Precision: Which percentage of digits your classifier labeled as your chosen digit were actually it TP/(TP + FP)
- Recall: Which percentage of your chosen digit was recognized by your classifier as such: TP/(TP + FN)
You should be able to achieve at least 90% accuracy on the validation set (increase the number of iterations of the training loop if not), ideally even much higher (over 98%). Using the validation set, determine which digits are most often mistaken for yours. For example, if you picked 2 as your digit, which other digit was most often mistaken for a 2?
You can experiment with different number of hidden neurons and even add more layers and different activation functions, to see which one performs best. Generally, you will
want to use Sigmoid
or LeakyReLU
activation functions in the hidden layers and Sigmoid
in the output layer, but experiment with different options for the hidden layers. If you do these experiments, always use the performance on the validation set as your evaluation metric. This will tell you how well your network generalizes, rather than just measuring how much it memorizes.
Hints:
- The existing code performs a so-called normalization, by dividing every pixel by 255. The result is that every input value will be between 0 and 1. Do not change this!
- You can perform many operations on entire tensors. For example
(y_pred > 0.5).float()
will turn your prediction into a tensor of 0s and 1s, depending on whether the prediction was less than or equal to 0.5 or not. - Tensors also have methods to calculate
sum
,mean
andabs
. There is alsomin
andmax
, which can also calculate the minimum and maximum along a particular dimension (e.g. to calculate the maximum value for each row), and tell you at which index this element was found. You may want to keep the documentation handy. - You can use a bool tensor to index another tensor. For example
labels_validation == 3
will tell you which images are 3s by producing a tensor that isTrue
for each image that is of a 3 andFalse
for all others. You can select all images that are 3s withx_validation[labels_validation == 3]
. This can be particularly useful to determine how each digit was classified by the neural network. show_image
can be used to display or even store an image to a file (on Colab, only storing to a file is supported; you can find these files on the left side in the folder view). The function has a parameterscale
that you have to set toSCALE_01
to undo the normalization, or the image will just be black.- Print the
loss
in every iteration of the loop. If your training process works, the loss should steadily decrease, to below 0.1. If your loss does not go below 1 after even a few iterations, try reducing the learning rate.
Generative Adversarial Networks
For the second task, construct a second network that is responsible for generating new images from random inputs as a “mirror” of your classification network. For example, if your classifier has 3 hidden layers with 512, 256, and 128 neurons, your generator should have 3 hidden layers with 128, 256, and 512 neurons, with similar activation functions. The generator should use 100 inputs, and produce 28*28 = 784 outputs (TODO 7). To generate new images, these two networks “play a game” against each other, where the generator has the goal of producing images that look as realistic as possible, and the discriminator has to determine which images were produced by the generator and which are real.
Training the GAN
To train your GAN, you will need two functions: One to train the generator and one to train the discriminator. First, implement the function train_discriminator
,
which you can pass a tensor with real images and a tensor with fake images (TODO 8) which uses a training loop of this form:
for i in range(n1):
# 1. reset gradients (from previous iteration) to zero
# 2. Pass true training examples (= real images) through network
# 3. Calculate the loss, comparing with a tensor of all 1 (use torch.ones_like)
# 4. Calculate the gradient by calling loss.backward()
# 5. Pass false training examples (= fake images) through network
# 6. Calculate the loss, comparing with a tensor of all 0 (use torch.zeros_like)
# 7. Calculate the gradient by calling loss.backward(); this will *accumulate* the gradient with the one calculated just before
# 8. Perform an optimizer step
Note: this is very similar to the training loop before, but you will have two sets of inputs: One consisting entirely of images you know to be real, and one consisting entirely of images you know to be fake.
You can test this function with some random images/tensors (generated with torch.rand
), and some images from the data set.
Next implement the function train_generator
that samples random noise, passes it to the generator, and then passes the generated images on to the discriminator, and calculates the loss afterwards
(Caution: For the generator your goal is for the predictor to predict these images as real), then calculate the gradient and perform an optimization step (TODO 9). The training loop for this function looks like this:
for i in range(n2):
# 1. reset gradients (from previous iteration) to zero
# 2. Pass random noise (from torch.randn) through the network
# 3. Pass the produced images through the discriminator network (!)
# 4. Calculate the loss, comparing with a tensor of all 1 (use torch.ones_like)
# 5. Calculate the gradient by calling loss.backward()
# 6. Perform an optimizer step
Finally, implement the function gan
(TODO 10) that first instantiates the two networks, and an optimizer for each, and some random fake images to start with. The training itself consist of an outer loop,
and in each iteration, you first train the discriminator, then the generator and then add some new fake images to the training set for the discriminator:
x_false = torch.rand((100,784))
for i in range(n):
# 1. Train Discriminator on fake and real images
# 2. Train Generator using this discriminator
# 3. Generate new fake image with the generator (input = random numbers)
# 4. Select some of the old fake images to keep
# 5. Save some of the generated fake images for verification
Notes:
- Use
Adam
as the optimizer for both networks - You may have to use a very low learning rate (0.001 or less), especially if you notice that your generator error is 0, and the discriminator error is 1.
- Call
.detach()
on the fake images you keep in your repository so that you don’t unnecessarily calculate the gradients on the generator as well when you try to improve your discriminator - Don’t forget to
.zero_grads()
- The fact that pytorch accumulates gradients is actually useful when training the generator: You can pass both image sets (fake and real) separately, calculate the loss on each,
call
backward()
each time and you’ll have the total gradient. - One of the most common problems with GANs is that all produced images are identical or nearly identical (called “mode collapse”), because the generator has found the “perfect” image to fool the discriminator. Unfortunately, the only thing you can do when that happens is to restart training (perhaps with a lower learning rate) and hope to get a different/better initialization. Dropout layers (with probabilities of 0.5 even) in the generator may also help.
Perform this training and inspect the resulting images. Use a low n1 and n2 (around 5-10) at the beginning until you are sure that your code works and the loss actually decreases. Then increase these values (TODO 6). For the overall n, you will need very few iterations (less than 20). Don’t forget to include some sample images in your report, but also provide an estimate of what percentage of images “looks decent”, in your opinion. Once you have everything working, go back, and change the selection of which digit to generate (e.g. if you started with generating 2s, generate 4s now). You should be able to do this by changing the digit in a single place (in TODO 1), and everything else should work the same way.
Report
As this lab basically consists of two parts, your report should also consist of two sections
-
Training the Discriminator:
- Which digit did you choose?
- What is the architecture of your discriminator network?
- What performance do you achieve on the validation set?
- Which digits are easily confused with your chosen digit? Which ones were the most “different” (according to your network)? Does this match your intuition (e.g. one might expect 3 and 8 to be confused easily, was that actually what happened?) Include images of digits that were misclasified in your report!
-
The GAN
- What is the architecture of your generator network?
- What is your training schedule (number of iterations for each of the loops)? What does your image repository look like?
- Are the images of satisfying quality (include several example images in your report!)? How long does it take for you to reach a point where they look decent?
- Are the images that are produced different, or are they all the same?
- When you change the chosen digit, does the code still work, and what do the images of this other digit look like?
Submission
Submit the finished code (all python files you have, but do not include the MNIST data set), and your report pdf in a single zip file on Blackboard. As mentioned above, if you use Google Colab, download your file as a .py
file, I will not accept .ipynb
notebooks! Do not forget to put the names and BroncoIDs of both team members in the report as well! Only one of you has to submit, but if both do, please make sure you submit the same file, or I will pick one at random.
Bonus Points
The GANs we are training in this lab are chosen to be a good illustration of the general idea and relatively light in terms of resource-usage, but handwritten digits may not be the most exciting application. If you want to earn bonus points, you can apply your code to other data sets. This can be as easy (= 1 bonus point) as applying it to EMNIST (letters), KMNIST (Hiragana), Fashion MNIST (pictures of clothes), etc., all of which have the exact same structure as the MNIST dataset, so your code should run with only minor modifications. You can also experiment with more complex data sets like CIFAR or ImageNet, but results may be mixed, depending on how much computation time and modification of the network architecture you perform.
You are also free to play around with any other aspect of the network(s), and try to do something more than just Dropout layers against mode collapse, such as calculating an entropy-measure and using that as additional training input for the discriminator.
For any experiments, either do them in a separate copy of the python file, or add a way to switch to the actual assignment code (i.e. I still want to be able to run the code for the actual assignment)!