Lecture 9: Lab 2

# Artificial Intelligence

### Lab 2

---

# Lab 2

* In Lab 2 you will implement an agent that uses Monte Carlo Tree Search to play blackjack

* You will get a python "framework" with an implementation of the game and some agents to compare with

* You can work in groups of up to two students

* The submission deadline is on Tuesday, 3/23, AoE

---

# Blackjack

* Blackjack is a popular card game, where players play against the dealer/bank

* The player is dealt two cards and can request to draw more cards from the deck

* The goal is to get a sum of card values close to 21, but not over 21

* For example: Ten of Spades, Three of Hearts, Seven of Clubs are 10+3+7=20 points

* Jack, Queen and King are 10 points each, an Ace can count as 1 **or** 11 points (player's choice)

---

# Blackjack

* After the player has performed their actions, the dealer draws cards until they have more than 16 points

* If the player has more than 21 points, they lose

* (Else) If the dealer has more than 21 points, the player wins

* (Else) If the player then has more points than the dealer, the player wins

* (Else) If there is a tie, no one wins

* The winner gets an amount of points

---

# Blackjack

([Source](https://www.blackjackapprenticeship.com/how-to-play-blackjack/))

---

# Blackjack: Player Actions

A player can do one of four things:

* **Hit**: Request one more card 
 
 * **Stand**: Stop taking cards, passing the turn to the dealer 
 
 * **Double Down**: Draw exactly one more card and then stand, and double the bet (win or lose $2)
 
 * **Split**: If the first two cards have the same value, the player can split them into two hands, and continue playing with these two independently (each hand wins/loses $1)
 
---

# Blackjack Strategy

([Source](https://www.blackjackinfo.com/blackjack-basic-strategy-engine/?numdecks=6&soft17=s17&dbl=all&das=yes&surr=ns&peek=yes))

---

# MCTS for Blackjack

* Blackjack is a nice game for our purposes

* There is randomness

* We only have (up to) 4 actions to choose from

* Games only last a few turns

* There is actual strategy!

---

# MCTS for Blackjack

---

# MCTS for Blackjack

---

# MCTS for Blackjack

---

# MCTS for Blackjack

---

# MCTS for Blackjack

---

# MCTS for Blackjack

---

# MCTS for Blackjack

---

# The "Framework"

---

# Framework Structure

---

# Running the Framework

```
> python blackjack.py -q
Average points:  -1.58

> python blackjack.py -q basic
Average points:  -0.3

> python blackjack.py -q -n 3 timid
Average points:  -0.6666

> python blackjack.py --help
usage: blackjack.py [-h] [-n COUNT] [-s] [-r] [-d D] [player]

Run a simulation of a Blackjack agent.
...
```

---

# Framework: Game

```python
class Game:
    def __init__(self, cards, player, split_rule, verbose):
        pass

def round(self):
        pass
        
    def continue_round(self, player_cards, dealer_cards, bet):
        pass

def main(dtype, split_rule, verbose):
    
    deck = deck_types[dtype]
    p = Player("Playername", deck[:])
    g = Game(deck, p, split_rule, verbose)
    winnings = g.round()
```

---

# Deck Types?!

* No one said you *have* to play with a standard 52-card deck

* There are several (fictional) deck "types" defined in the code (select with `-d type`)

* Your agent will be able to (automatically!) play with any of them

* If you have ever wondered what would happen if a card was valued 12 points, or 1.5 points, etc., you can find out

* If you haven't, you will still find out

---

# MCTS

```python
class MCTSPlayer(Player):
    
    def get_action(self, cards, actions, dcards):

deck = self.deck[:]
        for p in cards + dealer_cards:
            deck.remove(p)

p = RolloutPlayer("Rollout", deck)
        g1 = Game(deck, p, verbose=False)
        
        results = {}
        for i in range(MCTS_N):
            p.reset()
            res = g1.continue_round(cards, dcards, self.bet)
        
        return best_action
```

---

# MCTS

* Currently, the `RolloutPlayer` **does not construct a tree**

* You will have to come up with a way to store and construct tree!

* When the `RolloutPlayer` should perform an action, you have to locate where in the tree it currently is:

- All actions expanded: Use selection strategy 
   - Not all actions expanded: Choose one to expand and start simulating
   - In simulation: Choose action randomly (optionally: with a different strategy)

---

# Report and Evaluation

* Gambling is bad

* But let's see how bad

* Compare your agent's performance with the other agents'

* By default, you will get an average over 100 games

* For the final evaluation, use more repetitions to reduce the variance

* Optional: Statistical analysis

---

# Report and Evaluation

* You can use whichever selection strategy you want (epsilon-greedy, roulette wheel, UCT, your own invention, ...)

* It will likely have some parameter(s)

* Try how/if different parameter values have an effect on your agent's performance

* Also vary the general `MCTS_N` parameter!

---

# References
  
  * [MCTS Tutorial](https://www.cs.swarthmore.edu/~bryce/cs63/s16/reading/mcts.html)
  
  * [MCTS Slides](https://www.lri.fr/~sebag/Slides/InvitedTutorial_CP12.pdf)
  
  * [MCTS for a Card Game](http://teaching.csse.uwa.edu.au/units/CITS3001/project/2017/paper1.pdf)