Lab 8
Introduction
In this lab, we will revisit both Q-Learning and Neural Networks to build a reinforcement learning agent that solves a task in Minecraft. We will use malmoenv, which is a python
library that connects to Minecraft and exposes it as a gym environment (similar to the PoleCart task from lab 6). If you want to use your own laptop, you should be able to install malmoenv with pip install malmoenv
.
Report
You are required to document your work in a report, that you should write while you work on the lab. Include all requested images, and any other graphs you deem interesting, and describe what you observe. The lab text will prompt you for specific information at times, but you are expected to fill in other text to produce a coherent document. At the end of the lab, send an email with the names and carnés of the students in the group as well as the zip file containing the lab report as a pdf, and all code you wrote to the two professors and the assistant (markus.eger.ucr@gmail.com, joseaguevara@gmail.com, diegomoraj@outlook.com) with the subject “[CI-2600]Lab 8, carné 1, carné 2”. Do not include the entire minecraft environment in this zip file or email!
AI Gym
To get started download this zip file from the class website, which contains Minecraft, lab8.py
and some levels. The python file will create the gym environment, set up a training loop and has the
general structure of the Deep Q Learning algorithm. Before you run this file, you will need to start Minecraft, which is most easily done using the launch.bat
file included in the zip. Alternatively (on Linux/Mac OS),
you can use python -c "import malmoenv.bootstrap; malmoenv.bootstrap.launch_minecraft(9000)"
(make sure you are in the lab 8 directory!). Wait until Minecraft shows the main menu,
and then run lab8.py
. It will load a map, and run the agent. While this agent will attempt to do Q Learning, it uses an intentionally bad Neural Network architecture and parameter settings. The goal for the agent is to
reach the gold block that is somewhere in the room, and your task is to improve the agent so that it does so reasonably consistently. In contrast to lab 6, you will not get any explicit information about the
environment, instead the observations will be exactly the pixels that are visible on the screen. The actions the agent can perform are: Move forward or backward, turn left or right, jump, or do nothing.
Deep Q-Learning
As described above, the agent observes the pixels from the screen, which in our case is a 320x240 window, with three color channel, for a total of 230400 features with values from 0 and 255. Creating
a Q-Table for this problem would therefore be infeasible, even if we cleverly compress pixels. However, a Q-Table is really just an approximation to the Q-function, and there are other ways we can approximate functions:
Neural Networks! The Neural Network in the zip file takes the 230400 input values and sends them through a hidden layer with 3 neurons, and then produces estimates for the Q values for each possible action.
You can use this neural network exactly like you used the Q-Table last time: When you have to decide on an action, you use the one that results in the highest result.
To train the network, we use a second network, the target network that provides the “expected reward in the next state”, as discussed in class. We collect training data in a minibatch, where each sample consists of
the observation, the chosen action, the reward obtained, and the next state. For training, we use the target network to calculate an estimate of the best Q value of the next state, multiply this value with gamma, and
add the reward to determine which output our neural network should have. The code in lab8.py
performs all these calculations already. Note that we use the Huber loss, which is basically the MSE for small differences, and
the absolute difference for larger errors, making it a bit more resilient against outliers than the MSELoss. This can be beneficial for our application since our agent will initially not know what to do at all, and may
therefore produce several “outliers”, which might otherwise cause the gradients to become infinite.
For this task you have total freedom to explore different options, but as a general guideline you will likely want some Convolutional layers, which are special units that are not fully connected to the previous layer, but instead look at an n-by-n “neighborhood”. This is particularly useful for things like images (which we have), where each neuron only looks at a subset of the input image. Make sure to document what you tried, and any observations you made. Also describe the architecture of the neural network you ended up with. Note: Unlike the previous labs, this lab is a lot more open-ended, and it is possible that your agent will not perform as well as you’d like. Training is also time-consuming, because the agent has to physically move through the game world, but that also means that you can observe how it does. As with the pole cart, individual good performances are not as meaningful, though, and the goal should be consistency.
Levels
You may have noticed DIFFICULTY
variable in the python file. The default level (difficulty 0) is a 7x7 room, but if you increase the difficulty, you will get larget rooms (difficulty 1 is 14x14), multiple rooms with
passages between them (difficulties 2 and 3), or even lava that can cause the agent to die (difficulty 4). If you find a setup that you believe works well on the basic room layout, try training a network for the higher
levels.
If you really want to challenge yourself, there are many tasks beyond simply finding the goal, such as crafting items (which requires resource gathering), or building structures. We will likely not have enough time to fully train agents for these scenarios, but if you find the tasks interesting, feel free to experiment at home, and discuss your findings with the instructors. If you enjoy this kind of task, there are competitions with the goal of obtaining diamonds in an actual Minecraft world using the same environment as this lab.