Lecture 22: Generative Adversarial Networks

# Artificial Intelligence

## Generative Adversarial Networks

---

# Artificial Neural Networks

* Last week, we discussed Artificial Neural Networks (ANNs)

* Today we will look at some interesting things we can do with them

* First, let's look at the interior of a neural network

---

# Artificial Neural Networks

$$
\vec{h} = f_1(W_1 \cdot \vec{x})\\\\
y = f_2(\vec{w_2} \cdot \vec{h})\\\\
y = f_2(\vec{w_2} \cdot f_1(W_1 \cdot \vec{x}))
$$
]

---

# Information Content

* Our Neural Networks are deterministic functions

* In fact, each individual layer is a deterministic function

* This means, each layer's output **only** depends on its input

* We can view the output at each layer as an "encoding" of the input of the network that is used by the next layer

---

# Auto-Encoders

* One application of this idea are Auto-encoders 
  
  * They are neural networks with several layers, that become narrower and narrower (fewer neurons), before they widen again 
  
  * The number of inputs is the same as the number of outputs, and the training examples use the *same* values for input and output 
  
  * The goal is to learn a smaller *representation* for the input data
  
  * In essence, the ANN has to reconstruct the input from fewer values
  
---

# Auto-Encoders

---

# Auto-Encoders

---

# Vector Embeddings

* Auto-Encoders allow us to represent data with fewer values

* We can view this representation as **vectors**

* With the proper training these vectors can be used instead of the original data in our actual application

* Next week we will talk about a related approach that represents words as vectors!

---

# ANNs: An Alternative View

* A Neural Network is a function that takes a vector as input and produces a vector as output

* We can tweak this function to produce outputs closer to the ones we already have

* As long as we can describe what we want as something differentiable, e.g. comparing with training data using a differentiable function, we can train the network with gradient descent

---

# Adversarial Training

* Say someone has a neural network that can distinguish between cats and non-cats

* We want to "smuggle" a cat past the network

* This means: We want an image of a cat that the network identifies as a non-cat

* Why? To improve the network, of course! (There are more sinister applications, too)

---

# Adversarial Training

* Take an existing image of a cat

* Change it "a little bit"

* Check if it is now classified as a non-cat

* Repeat

---

# Adversarial Training

* Pass your existing image through the network

* Note which pixels have the greatest impact on the result (weights)

* Change (only) those pixels

---

# Adversarial Example

<center>
<img src="/CI-2600/assets/img/badcat.jpg" width="80%"/>
</center>
--
This is a bird!

---

# Adversarial Training

* Maybe we could automate this process?

* Basically, we want to learn how to "fool" a classifier

* But what do we use as our representation and learning objective?

* Ideally, our process would produce new images

---

# Generative, Adversarial Networks

* So far we have used Neural Networks to classify images, or predict some value

* Could we **generate** things with a Neural Network?

* Crazy idea: We pass the Neural Network some random numbers and it produces a new Picasso-like painting

* That's exactly what we'll do!

---

# First: Classification

* To produce a Picasso-like painting, we first need to know which paintings *are* Picasso-like

* We could train a Neural Network that detects "real" Picassos (the "Discriminator")

* Input: An image

* Output: "True" Picasso, or "fake"

* So we'll need some real and fake Picassos to start with ...

---

# Art Appreciation

* Real Picassos are easy to come by [citation needed]

* Where do we get our fakes?

* Picasso basically painted randomly, so let's use randomly generated images!

---

# Art Connoisseur Network

* After some training, our network will be able to distinguish real and fake picassos

* This means we can give this network a new painting, and it will tell us if it is real or not

* Now we can define the task for our generator more clearly: Fool the discriminator network, i.e. generate paintings that the discriminator recognizes as "real" Picassos

---

# A Word on Loss Functions

* How did we train our neural networks?

* We calculated the gradient of the loss function wrt the model parameters

* We said that we need our loss function to be differentiable

* What else is differentiable? Our discriminator network!

---

# The Generator Network

* The Generator Network takes, as we wanted, a vector of random numbers as input, and produces a picture as output

* The **loss function** for this network then consists of passing the produced image through the discriminator and determining if it believes the painting to be real or not

* We can then use backpropagation and gradient descent, as usual, to update the weights in our generator

* Over time, our generator will learn to fool the discriminator!

---

# Not quite enough ...

* If our discriminator was "perfect", this would already be enough

* However, to start, we needed some "fake" Picassos, which we just generated randomly

* Once the Generator produces some images, we actually have "better fakes"!

* So we can improve the Detector with that

* And then we need to improve the Generator again, etc.

---

# Generative, Adversarial Networks

* Generative: We **generate** images 
* Adversarial: The Generator and the Discriminator play a "game" against each other

---

# The Generative Game

* Discriminator learns to detect fake images (optimization with gradient descent)

* Generator learns to produce fake images that look real to the discriminator (optimization with gradient descent)

* Discriminator learns to detect these new fake images

* Generator learns to fool the updated discriminator

* ...

---

# Stability

* So you run this training for some iterations

* In one iteration, your generator produces 100 images (A), you train the discriminator to recognize them

* Then the generator learns to produce 100 new images (B) that fool the discriminator

* The discriminator now learns to recognize those

* Then the generator learns to produce the 100 images in (A) **again** because now those fool the discriminator

* etc.

---

# A Replay Buffer

* To avoid such problems, it can be worth it to keep a "repository" of old images

* But if we keep **all** old images around, training will slow down pretty quickly

* Instead, we could have a repository of, say, 200 old images, and we select 100 of those at random

* Then we add 100 new images and we have a new repository

* We always use these entire 200 images to train the discriminator (some old, some new)

---

# Mode Collapse

* The goal of the generator is to **minimize** the error (= how many images the discriminator recognizes as fake)

* The input of the generator is random noise

* Imagine there is a **perfect** fake image

* The generator could learn to ignore the input and produce only this image

---

# Mode Collapse

* Once the generator produces **only** the perfect image, the loss, and therefore the gradient for each image will be the same

* In the next iteration, the generator will also only produce **one** image

* The generation process has "collapsed" to a single example

* Generally, we don't want that

---

# Mode Collapse: Randomization

* What can we do? When we get to that point, nothing :(

* To prevent getting there: Introduce more randomness

* Dropout layers: After the activation function, randomly set values to 0 (with a probability p)

* Randomize labels: When training the generator, randomly set some of the labels to 0

---

# Mode Collapse: Diversify Generation

* Another option is to explicitly encourage generation of different images

* For each set of generated images, calculate the average per-pixel variance

* Use this variance as an additional input for the discriminator

* If variance = 0 very often, the discriminator will learn to use that to identify fake images

---

# GAN Variants

* Generating faces or photos from existing ones

* Additionally providing a class to generate specific pictures

* Generate an image from a textual description

* Apply a style to an existing image ("Style transfer")

---

# Samples 
<center>
<img src="/CI-0129/assets/img/gangenerated.png" width="80%"/>
</center>

[Large Scale GAN Training for High Fidelity Natural Image Synthesis](https://arxiv.org/abs/1809.11096)

Also check out: [thiscatdoesnotexist.com/](https://thiscatdoesnotexist.com/)

---

# Cycle GAN

[Unpaired Image-to-Image Translationusing Cycle-Consistent Adversarial Networks](https://junyanz.github.io/CycleGAN/)

---

# Cycle GAN

[Turning Fortnite into PUBG with Deep Learning (CycleGAN)](https://towardsdatascience.com/turning-fortnite-into-pubg-with-deep-learning-cyclegan-2f9d339dcdb0)

---

# GANcraft

[NVidia GANcraft](https://nvlabs.github.io/GANcraft/)

---

# References

* [Synthesizing Robust Adversarial Examples](https://arxiv.org/pdf/1707.07397.pdf)

* [Auto-Encoder: What Is It? And What Is It Used For?](https://towardsdatascience.com/auto-encoder-what-is-it-and-what-is-it-used-for-part-1-3e5c6f017726)
  
  * [GAN Introduction](https://machinelearningmastery.com/how-to-develop-a-generative-adversarial-network-for-an-mnist-handwritten-digits-from-scratch-in-keras/)
  
  * [GAN hacks](https://github.com/soumith/ganhacks)
  
  * [GAN Variations](https://developers.google.com/machine-learning/gan/applications)
  
  * [Large Scale GAN Training for High Fidelity Natural Image Synthesis](https://arxiv.org/abs/1809.11096)