class: center, middle # Artificial Intelligence ### Neural Networks in PyTorch --- # PyTorch * PyTorch is a python library for Machine Learning * At every major Machine Learning conference last year, the majority of papers used PyTorch * The other "big one" is Tensorflow, which is a bit older and has therefore more adoption in industry * PyTorch is easier to get started with, and once you know the core concepts you can easily pick up Tensorflow, too --- # PyTorch vs. Tensorflow
--- # Book
(
Free PDF available
) --- class: medium # Installing PyTorch * You need Python 3, **64 bits** * The PyTorch website has a [Get Started](https://pytorch.org/get-started/locally/) section, which tells you how to install in various environments * On Windows (will automatically install dependencies): ``` pip install torch===1.5.0 torchvision===0.6.0 -f https://download.pytorch.org/whl/torch_stable.html ``` Also install `PIL` with `pip install PIL`! --- # Testing the Installation ```Python import numpy import PIL import torch import torchvision import struct if struct.calcsize("P") * 8 != 64: raise Exception("Not a 64 bit interpreter") print("OK") ``` --- class: medium # Vector and Matrices and Tensors, oh my! * Vector: An ordered list of numbers * Matrix: A grid of numbers * We could store a matrix in a vector, if we remember the dimensions! * **Tensor**: What if we "need" more dimension? * For example: We have 10000 images of 28x28 pixels. We store them in a 10000x28x28 tensor * In memory the pixels will still be stored sequentially, the tensor is really just a different "view" on the data --- class: medium # PyTorch * (Almost) everything is a `pytorch.Tensor`! * These tensors really are just views into a sequential array of numbers * Each tensor can also "remember" where it came from (e.g. if it is the result of an addition) * This means, you can also view a tensor as a "tree" of computation, with the result at the top, and the inputs as the leaves * You can also tell your tensors that the operations should be performed on the GPU --- # PyTorch: Computation Graphs
--- class: center, middle # Tensors: Squeeze, Unsqueeze, View --- # Tensors in PyTorch * A Tensor consists of an array of memory and a way to look at that memory * The **dimensionality** of a tensor defines how many "sizes" it has * The **shape** of a tensor tells us how many elements exist in each dimension * A tensor with shape `[24,1]` has a different shape than a tensor using the same data with shape `[12,2]`, but also from a tensor with shape `[24]`, or `[24,1,1]` --- # Tensors in PyTorch Let `x` be a tensor with shape `[12,2]` * `x[0,0]` is the first element * `x[0]` (or `x[0,:]`) is the first **row** (a tensor of shape `[2]`) * `x[:,0]` is the first **column** (a tensor of shape `[12]`) * `x.T` is a tensor with shape `[2,12]` (the transpose) --- # Squeeze Squeeze is used to *remove* one/all dimension(s) of size 1: * If `x.shape` is `[12,2]`, `x.squeeze()` does *nothing* * If `x.shape` is `[24,1]`, `x.squeeze()` produces a tensor of shape `[24]` * If `x.shape` is `[24,1,1]`, `x.squeeze()` produces a tensor of shape `[24]` * If `x.shape` is `[24,1,1]`, `x.squeeze(1)` produces a tensor of shape `[24,1]` --- # Unsqueeze Unsqueeze is used to *insert* a dimension of size 1: * If `x.shape` is `[12,2]`, `x.unsqueeze(0)` produces a tensor of shape `[1,12,2]` * If `x.shape` is `[12,2]`, `x.unsqueeze(1)` produces a tensor of shape `[12,1,2]` * If `x.shape` is `[12,2]`, `x.unsqueeze(2)` produces a tensor of shape `[12,2,1]` --- class: mmmedium # View View is used to convert the shape of a tensor to something "arbitrary" (with the same total number) * If `x.shape` is `[12,2]`, `x.view(24)` produces a tensor of shape `[24]` * If `x.shape` is `[24]`, `x.view((24,1))` produces a tensor of shape `[24,1]` (exactly like `x.unsqueeze(1)`) * If `x.shape` is `[24]`, `x.view((2,3,4))` produces a tensor of shape `[2,3,4]` * If `x.shape` is `[24,1]`, `x.view(24)` produces a tensor of shape `[24]` (exactly like `x.squeeze(1)`) * If `x.shape` is `[12,2]`, `x.view((8,3))` produces a tensor of shape `[8,3]` * If `x.shape` is `[12,2]`, `x.view((8,6))` produces an error --- class: medium # View **One** dimension passed to `view` can be `-1`. Because `view` knows how many elements there are in total, it will just put "the rest" * If `x.shape` is `[12,2]`, `x.view(-1)` produces a tensor of shape `[24]` * If `x.shape` is `[n]`, `x.view((n,-1))` produces a tensor of shape `[n,1]` (exactly like `x.unsqueeze(1)`) * If `x.shape` is `[24]`, `x.view((2,-1,4))` produces a tensor of shape `[2,3,4]` * If `x.shape` is `[24,1]`, `x.view(-1)` produces a tensor of shape `[24]` (exactly like `x.squeeze(1)`) --- class: center, middle # Neural Networks in PyTorch --- # Artificial Neural Networks .left-column[
] .right-column[ We introduced one or more "hidden layers", which will hold intermediate values $$ \vec{h} = f_1(W_1 \cdot \vec{x})\\\\ y = f_2(\vec{w_2} \cdot \vec{h})\\\\ y = f_2(\vec{w_2} \cdot f_1(W_1 \cdot \vec{x})) $$ ] --- # Artificial Neural Networks
--- # Defining a Neural Network * Our Neural Network consists of Layers * Each layer consists of a linear part, and an activation function * The output from one layer is passed as the input to the next layer * In PyTorch, we can define such a network as a "sequence" --- # Defining a Neural Network ```Python model = torch.nn.Sequential( # number of inputs: 128*128*3 -> number of outputs: 100 torch.nn.Linear(128*128*3, 100), # Activation function torch.nn.Sigmoid(), # number of inputs: 100 -> number of outputs: 1 torch.nn.Linear(100,1) ) y_hat = model(x) ``` --- # Defining a Neural Network: Alternative ```Python class TwoLayerNet(torch.nn.Module): def __init__(self, D_in, H, D_out): super(TwoLayerNet, self).__init__() self.linear1 = torch.nn.Linear(D_in, H) self.sig1 = torch.nn.Sigmoid() self.linear2 = torch.nn.Linear(H, D_out) def forward(self, x): h = self.linear1(x) h = self.sig1(h) return self.linear2(h) model = TwoLayerNet(128*128*3, 100, 1) y_hat = model(x) ``` --- # Learning * Once we have a neural network, we need to train it * Training works by "showing" the network a training example, and comparing the network's output to the desired output * Then we change the parameters/weights of the network * Of course, PyTorch can do all of this (mostly) automatically for us! --- # Learning: Loss Function * Last time we talked about the "Mean Squared Error" * PyTorch provides you with the class `torch.nn.MSELoss` * You can use it to compute the loss as a 1x1 tensor * You can then call the method `backward()` on this tensor to calculate the gradient of the loss! --- # Learning: Optimizers * An **Optimizer** is an algorithm that changes the network parameters depending on the loss/gradient * PyTorch provides several optimizers, I recommend `Adam` for most tasks * `Adam` needs you to calculate the error/loss, and call `backward()` * Then you call the `step()` method --- # Learning
--- # Training a Neural Network ```Python loss_fn = torch.nn.MSELoss() # lr = Learning Rate: How fast the parameters change # Adam will change the learning rate over time! optimizer = torch.optim.Adam(model.parameters(), lr=0.1) for t in range(n): y_pred = model(x) loss = loss_fn(y_pred, y) # IMPORTANT: Reset the gradient to zero! optimizer.zero_grad() # Calculate the gradient loss.backward() # Change the network parameters optimizer.step() ``` --- # Classification Using Neural Networks * So far, we have looked at predicting numbers as outputs * What if we just want to distinguish classes, like "cat" and "not cat"? * Idea: Let's set a "threshold". If the output number is higher it means "cat", otherwise it means "not cat" * Second idea: Let's limit our output to be between 0 and 1 by using a Sigmoid --- # Training * Now we train our neural network. We have images of cats and non-cats * As the desired output we set 1 for all cats, and 0 for all non-cats * Then we calculate the loss: MSE is not great for such a limited range, so we use something else * `torch.nn.BCELoss` ("Binary Cross Entropy Loss") --- # Defining the Classifier ```Python model = torch.nn.Sequential( # number of inputs: 128*128*3 -> number of outputs: 100 torch.nn.Linear(128*128*3, 100), torch.nn.Sigmoid(), # number of inputs: 100 -> number of outputs: 1 torch.nn.Linear(100,1), # make sure the output is between 0 and 1 torch.nn.Sigmoid() ) ``` --- # Code ```Python # Binary Cross Entropy Loss loss_fn = torch.nn.BCELoss() optimizer = torch.optim.Adam(model.parameters(), lr=0.1) for t in range(n): # IMPORTANT: Reset the gradient to zero! optimizer.zero_grad() # Calling "backward()" twice *accumulates* the # gradient of the two classes y_pred = model(cats) loss = loss_fn(y_pred, torch.ones(y_pred.shape)) loss.backward() y_pred = model(noncats) loss = loss_fn(y_pred, torch.zeros(y_pred.shape)) loss.backward() optimizer.step() ``` --- # Classification Metrics * In our regression task we just measured the numerical distance to determine how "well" we did * In classification there can be different types of mistakes, and it may make sense to distinguish between them * If we have an image of a cat and classify it as a "non-cat" (false negative) that is a different error than if we have an image of a non-cat and we classify it as a "cat" (false positive) --- class: medium # Classification Metrics * Using the true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN) we can defined several metrics * **Accuracy**: percentage of correctly classified samples: (TP + TN)/(TP + TN + FP + FN) * **Precision**: percentage of true positives among all positives: TP/(TP + FP) * **Recall**: percentage of positives that were correctly identified: TP/(TP + FN) * **F1 score**: Harmonic mean of precision and recall --- class: small # Metrics
$$ \text{accuracy} = \frac{\mathit{TP} + \mathit{TN}}{\mathit{TP} + \mathit{FP} + \mathit{TN} + \mathit{FN}} = \frac{90 + 940}{90 + 940 + 60 + 10} = 0.94$$ $$ \text{precision} = \frac{\mathit{TP}}{\mathit{TP} + \mathit{FP}} = \frac{90}{90+ 60} = 0.6 $$ $$ \text{recall} = \frac{\mathit{TP}}{\mathit{TP} + \mathit{FN}} = \frac{90}{90 + 10} = 0.9$$ $$ \text{F1} = \frac{2\cdot\text{precision}\cdot\text{recall}}{\text{precision} + \text{recall}} = \frac{2\cdot 0.6\cdot 0.9}{0.6+0.9} = 0.72$$ --- class: center, middle # Lab 4 --- # Lab 4 * In Lab 4 you will work with pictures of handwritten digits * First, we will train a neural network to identify a certain digit (e.g. 2) * Then, we will train a second neural network to **generate** new pictures from random noise * This technique is called "Generative Adversarial Networks", and we will talk more about it next week --- # Lab 4: MNIST
--- class: mmedium # Lab 4: Task 1 * Construct a neural network (based on the code above) to classify images into "2" and "not 2" (you can pick any digit you like) * Train the neural network using data read with the provided code * Note how many images your network classifies correctly and incorrectly, and calculate the percentage of correctly classified images (accuracy) * You should be able to classify more than 85% of images correctly (ideally around 95%) * Also generate some random images (`torch.rand((nrows,ncols))`) to train your network with and measure how it performs on them --- class: mmedium # Lab 4: Pitfalls * PyTorch Neural Networks **always** need a matrix as input and **always** produce a matrix as output (efficiency) * This means, you have to make sure that your data is in the right shape * Your images have to have 28x28 = 784 **columns** and as many rows as you have pictures * The output will be a **matrix** with 1 column, and as many rows as you put pictures into your network * For BCELoss, the tensors have to have the same shape, so your labels also have to be in a matrix * Always check your tensor shapes (attribute `.shape`) --- # Lab 4: Pitfalls * Your images should *always* be scaled to use numbers between 0 and 1 * The provided code performs this normalization (division by 255) * When you want to show one of the images, make sure to tell `show_image` to re-scale it properly --- # References * [Torch Tensor Operations Overview](https://jhui.github.io/2018/02/09/PyTorch-Basic-operations/) * [DeepLearning With PyTorch](https://pytorch.org/deep-learning-with-pytorch) * [Adam: A Method for Stochastic Optimization](https://arxiv.org/pdf/1412.6980.pdf)