We have a turn-based game
Our goal is to get the highest possible score
Our opponent wants us to get the lowest possible score
It is our turn
For every possible move we could make, we consider every possible move our opponent could make, etc.
For each possible sequence of moves we calculate our score
We then assume that our opponent will choose the action that results in the lowest score for us
This game tree will be huge
We heard about alpha-beta pruning to reduce the tree size, but even with that the tree is too large to calculate for many games
Many games also have random components
What now?
Idea: Don't calculate the entire tree, but instead sample random(-ish) sequences ("rollouts")
Record the outcomes for these playouts
Repeat a large number of times
At the end, we will have an estimate for the amount of points we will get for each action
If we pick actions completely at random for our rollouts we will need too many repetitions to get a good estimate
But we can use the information we learn during the rollouts to "guide" future iterations
For example: Say we have already performed 100 rollouts, which gives us a (probably bad) estimate for the expected value of each action
For the next rollouts, we will choose the action with the highest expected value with the highest probability
Over time, our sampling process will collect more samples for more promising actions
Our algorithm will construct a game tree piece by piece. Each iteration it expands the partial tree in four steps:
Select actions from the tree until we reach a node we haven't fully expanded yet
Expand a new action from that node
Simulate the game until the end, and note the result
Backpropagate this result back up the tree
Algorithm:
Select actions from the tree until we reach a node we haven't fully expanded yet
Expand a new action from that node
Simulate the game until the end, and note the result
Backpropagate this result back up the tree
We use the scores we have obtained so far to choose which action to select until we reach a leaf
One approach would be to always pick the action with the (currently) highest expected value
However, this would ignore actions that got bad results due to "bad luck" in the rollout
There are several different selection strategies we can use to overcome this problem
One of the simplest selection strategy uses a single parameter: ε
When we have to select an action, we choose a number between 0 and 1 uniformly at random
If that number is less than ε
, we choose an action uniformly at random
Otherwise we choose the action with the highest expected value
Epsilon-Greedy may be problematic if two actions have almost the same expected value
Ideally, we would choose each of these two with (almost) the same probability
Roulette-Wheel selection selects an action at random with weights determined by the expected value of each action
For example, if the expected value for four actions are 1, 4, 8, and 7, we choose the actions with probability 1/20, 4/20, 8/20 and 7/20, respectively
This is also called "fitness proportionate selection"
We can also use a more sophisticated selection strategy
UCT = "Upper Confidence Bound 1 applied to trees", based on the UCB-1 formula:
E+c√lnNn
Where E is the expected value of an action, N is the number of times we have chosen any action in the current state, and n is the number of times we have chosen this particular action
We can use c to "tune" the behavior, to prefer choosing the best action (lower c), or trying each action equally often (higher c)
We said once we reach a node we haven't fully expanded yet, we "simulate" the game until the end to get a result
How can we simulate a game?
Simplest variant: Each player performs completely random moves
We will build our tree piece by piece, but we will still need "many" repetitions to get good simulation results
Instead of moving randomly, we can use any other strategy we might know
For example, if we have a (bad) agent for the game, it could play the game for our simulation
As we build our tree, we will use more actions selected by our selection strategy and fewer by our "bad" agent
By not playing completely randomly, we may need fewer repetitions
What if we don't actually simulate?
Ideally, we would have the exact result of the game for a new action that we're exploring
For many games we can instead come up with a game state evaluation
In Chess, for example, we can say whoever has more valuable pieces on the board will likely win
Advanced ideas: Play a few random turns and then evaluate the state, use a Neural Network to evaluate the board state, etc.
What if we have a (shuffled) deck?
We already sample our actions, we can also sample from the deck!
For every iteration, we shuffle the deck, too (using known information)
In Blackjack the player can request to draw cards from the deck
The goal is to get a sum of card values close to 21, but not over 21
For example: Ten of Spades, Three of Hearts, Seven of Clubs are 10+3+7=20 points
Jack, Queen and King are 10 points each, an Ace can count as 1 or 11 points (player's choice)
After the player has performed their actions, the dealer draws cards until they have more than 16 points
If the player has more than 21 points, they lose
If the dealer has more than 21 points, the player wins
If the player then has more points than the dealer, the player wins
If there is a tie, no one wins
The winner gets an amount of money (like $1)
A player can do one of four things:
Hit: Request one more card
Stand: Stop taking cards, passing the turn to the dealer
Double Down: Draw exactly one more card and then stand, and double the bet (win or lose $2)
Split: If the first two cards have the same value, the player can split them into two hands, and continue playing with these two independently (each hand wins/loses $1)
Implementing MCTS for Blackjack is Lab 2 (4/5-22/5)
You will get a framework with a Blackjack implementation (and some AI agents)
Your task is to expand the existing sampling player with a proper action selection and tree construction
Weird feature: You can play with non-standard decks (e.g. only even cards), and your agent will figure it out!
We have a turn-based game
Our goal is to get the highest possible score
Our opponent wants us to get the lowest possible score
It is our turn
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |