Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Artificial Intelligence

1 / 44

Adversarial Search: Recap

  • We have a turn-based game

  • Our goal is to get the highest possible score

  • Our opponent wants us to get the lowest possible score

  • It is our turn

2 / 44

Adversarial Search: Recap

  • For every possible move we could make, we consider every possible move our opponent could make, etc.

  • For each possible sequence of moves we calculate our score

  • We then assume that our opponent will choose the action that results in the lowest score for us

3 / 44

Adversarial Search: Limitations

  • This game tree will be huge

  • We heard about alpha-beta pruning to reduce the tree size, but even with that the tree is too large to calculate for many games

  • Many games also have random components

  • What now?

4 / 44

Monte Carlo Tree Search

  • Idea: Don't calculate the entire tree, but instead sample random(-ish) sequences ("rollouts")

  • Record the outcomes for these playouts

  • Repeat a large number of times

  • At the end, we will have an estimate for the amount of points we will get for each action

5 / 44

Monte Carlo Tree Search

  • If we pick actions completely at random for our rollouts we will need too many repetitions to get a good estimate

  • But we can use the information we learn during the rollouts to "guide" future iterations

  • For example: Say we have already performed 100 rollouts, which gives us a (probably bad) estimate for the expected value of each action

  • For the next rollouts, we will choose the action with the highest expected value with the highest probability

  • Over time, our sampling process will collect more samples for more promising actions

6 / 44

Monte Carlo Tree Search

Our algorithm will construct a game tree piece by piece. Each iteration it expands the partial tree in four steps:

  • Select actions from the tree until we reach a node we haven't fully expanded yet

  • Expand a new action from that node

  • Simulate the game until the end, and note the result

  • Backpropagate this result back up the tree

7 / 44

MCTS

Selection 11/21 7/10 3/8 0/3 2/4 1/6 1/2 2/3 2/3 2/3 3/3 Expansion 11/21 7/10 3/8 0/3 2/4 1/6 1/2 2/3 2/3 2/3 3/3 0/0 Simulation 11/21 7/10 3/8 0/3 2/4 1/6 1/2 2/3 2/3 2/3 3/3 0/0 0/1 Backpropagation 11/22 8/11 3/8 0/3 2/4 1/7 1/2 2/3 2/3 2/3 4/4 0/1
8 / 44

MCTS for Tic-Tac-Toe

9 / 44

MCTS for Tic-Tac-Toe

10 / 44

MCTS for Tic-Tac-Toe

11 / 44

MCTS for Tic-Tac-Toe

12 / 44

MCTS for Tic-Tac-Toe

13 / 44

MCTS for Tic-Tac-Toe

14 / 44

MCTS for Tic-Tac-Toe

15 / 44

MCTS for Tic-Tac-Toe

16 / 44

MCTS for Tic-Tac-Toe

17 / 44

MCTS for Tic-Tac-Toe

18 / 44

MCTS for Tic-Tac-Toe

19 / 44

MCTS for Tic-Tac-Toe

20 / 44

MCTS for Tic-Tac-Toe

21 / 44

Monte Carlo Tree Search

Algorithm:

  • Select actions from the tree until we reach a node we haven't fully expanded yet

  • Expand a new action from that node

  • Simulate the game until the end, and note the result

  • Backpropagate this result back up the tree

22 / 44

Selection

  • We use the scores we have obtained so far to choose which action to select until we reach a leaf

  • One approach would be to always pick the action with the (currently) highest expected value

  • However, this would ignore actions that got bad results due to "bad luck" in the rollout

  • There are several different selection strategies we can use to overcome this problem

23 / 44

Epsilon-Greedy Selection

  • One of the simplest selection strategy uses a single parameter: ε

  • When we have to select an action, we choose a number between 0 and 1 uniformly at random

  • If that number is less than ε, we choose an action uniformly at random

  • Otherwise we choose the action with the highest expected value

24 / 44

Roulette-Wheel Selection

  • Epsilon-Greedy may be problematic if two actions have almost the same expected value

  • Ideally, we would choose each of these two with (almost) the same probability

  • Roulette-Wheel selection selects an action at random with weights determined by the expected value of each action

  • For example, if the expected value for four actions are 1, 4, 8, and 7, we choose the actions with probability 1/20, 4/20, 8/20 and 7/20, respectively

  • This is also called "fitness proportionate selection"

25 / 44

Roulette-Wheel Selection

26 / 44

UCT

  • We can also use a more sophisticated selection strategy

  • UCT = "Upper Confidence Bound 1 applied to trees", based on the UCB-1 formula:

E+clnNn

  • Where E is the expected value of an action, N is the number of times we have chosen any action in the current state, and n is the number of times we have chosen this particular action

  • We can use c to "tune" the behavior, to prefer choosing the best action (lower c), or trying each action equally often (higher c)

27 / 44

Simulation

  • We said once we reach a node we haven't fully expanded yet, we "simulate" the game until the end to get a result

  • How can we simulate a game?

  • Simplest variant: Each player performs completely random moves

  • We will build our tree piece by piece, but we will still need "many" repetitions to get good simulation results

28 / 44

Simulation

  • Instead of moving randomly, we can use any other strategy we might know

  • For example, if we have a (bad) agent for the game, it could play the game for our simulation

  • As we build our tree, we will use more actions selected by our selection strategy and fewer by our "bad" agent

  • By not playing completely randomly, we may need fewer repetitions

29 / 44

Simulation

  • What if we don't actually simulate?

  • Ideally, we would have the exact result of the game for a new action that we're exploring

  • For many games we can instead come up with a game state evaluation

  • In Chess, for example, we can say whoever has more valuable pieces on the board will likely win

  • Advanced ideas: Play a few random turns and then evaluate the state, use a Neural Network to evaluate the board state, etc.

30 / 44

Randomness

31 / 44

Randomness

  • What if we have a (shuffled) deck?

  • We already sample our actions, we can also sample from the deck!

  • For every iteration, we shuffle the deck, too (using known information)

32 / 44

Example: Blackjack

  • In Blackjack the player can request to draw cards from the deck

  • The goal is to get a sum of card values close to 21, but not over 21

  • For example: Ten of Spades, Three of Hearts, Seven of Clubs are 10+3+7=20 points

  • Jack, Queen and King are 10 points each, an Ace can count as 1 or 11 points (player's choice)

33 / 44

Blackjack

  • After the player has performed their actions, the dealer draws cards until they have more than 16 points

  • If the player has more than 21 points, they lose

  • If the dealer has more than 21 points, the player wins

  • If the player then has more points than the dealer, the player wins

  • If there is a tie, no one wins

  • The winner gets an amount of money (like $1)

34 / 44

Blackjack: Player Actions

A player can do one of four things:

  • Hit: Request one more card

  • Stand: Stop taking cards, passing the turn to the dealer

  • Double Down: Draw exactly one more card and then stand, and double the bet (win or lose $2)

  • Split: If the first two cards have the same value, the player can split them into two hands, and continue playing with these two independently (each hand wins/loses $1)

35 / 44

MCTS for Blackjack

36 / 44

MCTS for Blackjack

37 / 44

MCTS for Blackjack

38 / 44

MCTS for Blackjack

39 / 44

MCTS for Blackjack

40 / 44

MCTS for Blackjack

41 / 44

MCTS for Blackjack

42 / 44

Why Blackjack

  • Implementing MCTS for Blackjack is Lab 2 (4/5-22/5)

  • You will get a framework with a Blackjack implementation (and some AI agents)

  • Your task is to expand the existing sampling player with a proper action selection and tree construction

  • Weird feature: You can play with non-standard decks (e.g. only even cards), and your agent will figure it out!

43 / 44

Adversarial Search: Recap

  • We have a turn-based game

  • Our goal is to get the highest possible score

  • Our opponent wants us to get the lowest possible score

  • It is our turn

2 / 44
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow