Lecture 11: Deep Q Learning

# Machine Learning

## Deep Q Learning

### III-Verano 2019

---

# Exam Review

---

# Question 1

Provided a training dataset  of 40 000 images,  you are tasked with building a classifier capable of discriminating between between two types of elements (binary classification), let us call them for this example airplanes 
and cars. The size of the dataset is 3x32x32, i.e. 3-channel color images of 32x32 pixels in size. Using this information, you must draw the topology of the neural network (layers, number of 
neurons, activation functions, loss function).

Dado un conjunto de datos de entrenamiento de 40 000 imagénes, usted está asignado a realizar una discriminación entre dos elementos, llamémoslos aviones y carros, (clasificación binaria). El conjunto de datos tiene una estructura de 3x32x32, es decir 3-canales de color, y 32x32 pixelrd. Usando esta información, usted debe dibujar la la topología de la red neuronal.

---

# Question 2

Draw a sigmoid activation function (approximately). What are the minimum and maximum values for its output?

You want to predict the number of weeks a player is going to play your video game (regression). You collect data including age, average play time per day, and average scores, kills, deaths, retries per level. 
You want to use a neural network for this task. Draw the topology of the neural network (layers, number of neurons, activation functions, loss function).

---

# Question 3

You are classifying cats, dogs, and birds from images. Your classification model produced the following predictions:

.left-column[
Actual                | Predicted/Predicción
----------------------|----------------------------
Dog/Perro             | Dog/Perro
Cat/Gato              | Dog/Perro
Dog/Perro             | Dog/Perro
Dog/Perro             | Dog/Perro
Dog/Perro             | Bird/Pájaro
Cat/Gato              | Bird/Pájaro
Cat/Gato              | Cat/Gato
Cat/Gato              | Cat/Gato
Dog/Perro             | Cat/Gato
Bird/Pájaro    | Bird/Pájaro
Bird/Pájaro    | Bird/Pájaro
Bird/Pájaro    | Cat/Gato
Cat/Gato              | Cat/Gato
Dog/Perro             | Bird/Pájaro
Bird/Pájaro    | Bird/Pájaro
Cat/Gato              | Cat/Gato 
]

$$
\mathit{Precision} = \frac{\mathit{TP}}{\mathit{TP} + \mathit{FP}}\\\\
\mathit{Recall} = \frac{\mathit{TP}}{\mathit{TP} + \mathit{FN}}
$$

]

---

# Question 4

Your model has the weights w, and you just calculated the loss and the gradient. Assume a learning rate of 0.1, and calculate the new value for w.

Su modelo tiene pesos w, y usted ya calculó la pérdida y el gradiente. Asume una tasa de aprendizaje de 0.1, y calcule el valor nuevo para w.

$$
\vec{w} = \begin{pmatrix}2.1\\\\1.2\end{pmatrix}\\\\
\nabla w = \begin{pmatrix}11\\\\7\end{pmatrix}
$$

---

# Question 5

Exploratory data analysis (EDA), as seen in class, can provide significant insight regarding your data set, it is also a necessary step prior to constructing your model. Explain brifly (one parragraph of no more than 4 sentences) what kind of important information EDA can provde regarding your data.

El análisis exploratorio de datos, como se estudió en clase, permite realizar revelaciones significativas acerca de su conjunto de datos. Además es un paso necesario previo a la construcción de su modelo. Explique brevemente (en un parráfo no mayor a las cuatro oraciones), qué tipo de inofrmación le puede brindar el EDA acerca de sus datos.

---

# Reinforcement Learning

---

# Reinforcement Learning

---

# The Q function

Utility of a state:
  
$$
V^\pi(s) = R(s) + \gamma V(T(s, \pi(s)))
$$

Q function:

$$
Q(s,a) = R(s) + \max_{a'} \gamma Q(T(s,a),a')
$$

---

# Q Learning

* Q Learning is based on trials 
 
 * We store a Q table and continuously update it 
 
 * Rewards our agent receives tell us what the values of the Q table "should be"
 
 * Drawback: The Q table is huge
 
---

# Deep Q Learning

* `Q(s,a)` is a function 
   
   * We have heard of a way to approximate functions: Neural Networks!
   
   * Instead of storing all possible Q-values in a table we train a neural network
   
   * How?
   
---

# Deep Q Learning

---

# Q Learning

Repeat:

* Observe state 
  
  * Choose action according to Q table 
  
  * Perform action, get reward and new state 
  
  * Update Q table
  
---

# Deep Q Learning

Repeat:

* Observe state 
  
  * Choose action according to Q .red[network]
  
  * Perform action, get reward and new state 
  
  * Update Q .red[network]
  
--

How do we update? Gradient descent!

---

# Deep Q Learning

---

# Deep Q Learning: Challenge

---

# Deep Q Learning: Two Networks

---

# Deep Q Learning: Process

Use two networks: Target and prediction

* Run agent for `n` iterations using the prediction network and an epsilon-greedy strategy to determine the used actions, collect all states, actions, next states and rewards `(s,a,s',r)`
 
 * After collecting "some" data (like a "minibatch"), calculate the gradient with the difference between the **target** and the **prediction** (networks) as the loss to **update the prediction network**
 
 * After C iterations, copy the prediction network and use it as the new target network
 
---

# Applications

---

# Applications

* Reinforcement Learning/Q Learning often performs very well on problems where we can define a reward
  
  * Classical Q Learning has the problem of having to store the entire state space 
  
  * Deep Q Learning can be applied to larger problems, such as self-driving cars, video games, etc.
  
---

# Applications

<img src="/CI-2600/assets/img/DeepRLapplications.webp" width="80%"/>
  
---

# Application: Minecraft

---

# Malmö

"Project Malmö, named after a town between Cambridge, UK where it was developed and Stockholm, Sweden where Minecraft was created, is an Artificial Intelligence (AI) platform that allows researchers to create challenging and interesting tasks for evaluating agents, and Minecraft enthusiasts to engage in the Modding community and help advance AI."

"MalmoEnv is an OpenAI "gym" Python Environment for Malmo/Minecraft"

---

# Malmö Task: Build Battle

---

# Malmö Task: Get Diamond

Another task is to actually play Minecraft (without enemies), and get diamond

* First, gather wood, to create a wood pickaxe
  
  * Then use the wood pickaxe to get stone and a stone pickaxe
  
  * Find and mine iron
  
  * Build a smelter to smelt iron and create an iron pickaxe
  
  * Find and mine diamond

---

# Lab 8

* In the last lab we will apply Deep Q Learning to Minecraft 
  
  * Your agent will get the pixels on the screen (!) and has to perform a task
  
  * For the lab, the task will be to find a gold block in a room

* This is going to be very experimental!
  
---

# Lab 8

<img src="/CI-2600/assets/img/minecraft.png" width="80%"/>
  
---

# Lab 8: Notes and Hints
  
  * You will get 320x240 pixels with 3 color channels as input 
  
  * Use Convolutional Layers to reduce the size 
  
```
self.cnnlayer = Sequential(
    Conv2d(1, 4, kernel_size=3, stride=1, padding=1),
    BatchNorm2d(4),
    ReLU(inplace=True),
    MaxPool2d(kernel_size=2, stride=2))
```

---

# Reminder: Convolutional Neural Networks

---

# Lab 8: Results
  
  * Your agent may or may not solve the problem
  
  * Training may take "a while"
  
  * It's the last lab, it's supposed to be more fun than strictly about results!
  
  * The **process** is more important

---

# References

* [A Hands-On Introduction to Deep Q-Learning using OpenAI Gym in Python](https://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python/)
  
  * [A Comprehensive Guide to Convolutional Neural Networks](https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53)
  
  * [Deep Reinforcement Learning for General Video Game AI](https://arxiv.org/pdf/1806.02448.pdf)
  
  * [Project Malmo](https://www.microsoft.com/en-us/research/project/project-malmo/)