class: center, middle # Artificial Intelligence: Planning ### Planning under Uncertainty --- # Review: The Planning Problem A planning problem consists of three parts: * A definition of the current state of the world * A definition of a desired state of the world * A definition of the actions the agent can take --- class: medium # Uncertainty So far we have assumed that we have a perfect model of our planning problem: * We knew exactly what the state of the world is * We knew exactly how each action worked * And we assumed that no one would interfere These are pretty strong assumption! Let's relax them! --- class: medium # Uncertainty about the world * Imagine we have a robot that should perform tasks * We tell it to go to the grocery store * If it is raining, it will need some sort of protection, like an umbrella * But the robot may not know the weather outside * It could go and check! --- class: medium # Uncertainty about actions * We often/usually talk about robots performing our actions * Here's the problem with robots: hardware * Our robot may decide to move to a particular room, but maybe its motors break * Or maybe there is one small obstacle that it can't see but that blocks the path * Generally: Actions may fail --- # Interference * A special case of failing actions and uncertainty about the world is if we are not alone * Say there is another agent in the world * That agent has its own goals, and doesn't cooperate with us * What we know about the world may change because of what the other agent does * They may even actively try to stop us --- # Action Failure * Failing actions and interfering actors are actually kind of the same * In both cases there is someone else "doing" something * It may just be "nature"/randomness, or an intentional agent * We can handle both in the same way! --- class: medium # Sensing and Replanning * In some scenarios we can know what we don't know * For example, a household robot may have an action to wash the dishes, but knows that it doesn't know the particular model of dishwasher in the house * The action "wash the dishes" will then, when executed, require the robot to be in front of the dishwasher (to know the model), download the manual, and then cause a "replanning" to come up with the actual steps needed * In a way, the action "wash the dishes" is therefore an abstract action that will be decomposed into concrete actions when the necessary information is available --- class: center, middle # Probabilistic Actions --- # Review: Probability Theory * A random variable X is a function that maps *outcomes* S to real numbers * In this class, we only deal with discrete random variables, so we can avoid many of the nasty things * The outcomes describe the possible "states" we can have, and we have an *event space* F consisting of the power set of S * We assign probabilities to the outcomes (technically, to the elements of the event space) --- class: medium # Review: Probability Theory Requirements for the Probabilities: * `\(P(E) \ge 0\)` for all E in F * `\(P(S) = 1\)` * `\(P(X \cup Y) = P(X) + P(Y)\:\:\text{if}\:\:X \cap Y = \emptyset\)` What this means is basically that we can assign probabilities to the single events, and these axioms will tell us the probabilities for combinations of events. Example: A (fair) coin has `\(S = \{H, T\}, F = \{\{\}, \{H\}, \{T\}, \{H,T\}\}, P(H) = 0.5\)` We can then define a random variable `\(X(H) = 0, X(T) = 1\)` --- # Review: Probability Theory The *expected value* of a random variable X is: $$ E(X) = \sum_{s \in S} P(x) \cdot X(s) $$ In words: The expected value is the probability-weighted sum of the real values we assigned to each outcome. For our fair coin, this expected value is: $$ E(X) = \frac{1}{2} \cdot 0 + \frac{1}{2} \cdot 1 = 0.5 $$ The expected value tells us the "average" value we would get if we sampled the random variable "very often". --- # Probabilistic Actions * How could we model actions that have different outcomes? * Instead of having one effect, these actions may have one of a set of effects * Each possibility has a probability associated with it * For example, if we fire a gun, there is a 5% chance that the bullet is a blank --- # Action Planning * To start, let us just plan a single action * We have three possibilities: - Drive to the airport, 25% chance of a traffic jam - Train to the airport, 30% chance of delay - Walk to the airport, 99% chance of exhaustion * Which action do we choose? --- # Strategy: Maximize Expected Value * One approach is to pick the option that gives us the highest expected value * Each of these actions has two possible outcomes: We either get to the airport on time, or we don't * We assign a value of 1 to arriving on time, and 0 to the other case * That means, driving has the highest expected value (75% of arriving on time = expected value of 0.75) --- # Action Costs * But wait! Not all actions are created equal! * We have three possibilities: - Drive to the airport, 25% chance of a traffic jam, $40 - Train to the airport, 40% chance of delay, $20 - Walk to the airport, 99% chance of exhaustion, $2 for a drink * Which action do we choose? --- # Utility * How do we define the "value" of these actions? * One possibility is to say that if we don't make it to the airport, we wasted our money * That means that driving to the airport would have -30 utility if we are late, and 0 otherwise (we didn't waste anything) * With this approach, we can calculate the expected utility value of each action, and choose the one with the highest utility --- class: medium # Expected Value Consider these four choices: * Gain $100 * Flip a (fair) coin. On heads, gain $200, on tails gain $0 * Flip a (fair) coin. On heads, gain $500, on tails, pay $300 * Roll a 100-sided die. On a 1, gain $10 000, else gain $0 Which option would you choose? --- # Limitations of the Expected Value * In many cases, the expected value does not reflect how humans would make a decision * Sometimes that is because humans are not logical, sometimes there are other factors * In our example the expected value was exactly the same for all options * But the expected value expresses what happens if we perform the sampling "many times" --- # Strategy: Assume the Worst * If we only perform an action once, we may just want to be really cautious * Instead of analyzing each possibility, we analyze how we can avoid the "worst" outcome with the highest probability * In our case, if we take the $100, we always have $100 * However, the exact decision depends on the situation and may not always be possible to define mathematically --- class: medium # Another Example Consider these three choices: * Gain $100 * Roll a 100-sided die. On a 1, gain $25 000, else gain $0 * Roll a 100-sided die. On a 1, gain $0.2, else gain $0. Repeat 100 000 times Which option would you choose? --- # Planning * Remember: This is a class about Planning * So far we have only talked about *one* single action * What we want is a plan, i.e. a sequence of actions * We can calculate the expected value/utility of a plan as the combination of utilities of the individual steps --- # Plan Utility * We have two possible plans: - Call a taxi (10% chance of failing), take the taxi (25% chance of failing), check in at the airport (10% chance of failing) - Check in online (15% chance of failing), go to the train station (1% chance of failing), take the train (40% chance of failing) * Which one do we choose? --- # Failure * So far we have assumed that one of our possible outcomes of an action is "failure" * In many cases the outcome is just "random", or "different" * And even if it is "failure", we don't just want to give up * How could we account for that? --- # Conditional Plans * Instead of having one linear plan (or a partially ordered one), we can include conditions * For example, one action is "flip a coin", and our plan continues with two branches: One where the coin came up heads, and one where it came up with tails * This form of plan is basically a sequence of "if-then" statements after the non-deterministic actions --- class: center, middle # Adversaries --- # Adversaries * As mentioned before, we can think of non-deterministic outcomes as having "nature" act against us * However, *real* adversaries typically have their own goals * By reasoning about these goals, we may be able to anticipate their behavior and counteract * This integrates nicely with our strategy of "assuming the worst" --- # Adversarial Planning * Instead of assuming our actions have non-deterministic/random effects we assume that our actions are followed by an opponent's action * The opponent will always choose what is best for them * To determine what is "best" for the opponent, we can use planning "from their perspective" * However, we need to know what their goal is (more about this in 3 weeks) --- class: medium # Zero-Sum Games * One particularly interesting scenario are zero-sum games * These are scenarios in which the gain of one agent comes at the expense of another * For two players we can model this as one person getting positive rewards, and the other player getting the same negative reward, i.e. the sum of the two is zero * From our point of view, we want to maximize our points, while the opponent is trying to minimize our points (because that maximizes their own) --- class: medium # Minimax * Let's say we go first * For each of our potential actions, we look at each of the opponents possible actions * The opponent will pick the action that gives us the lowest score, and we will pick from our actions the one where the opponent's choice gives us the highest score * How does the opponent decide what to pick? The same way! --- # Minimax
image/svg+xml
0
1
2
3
4
+∞
10
5
-10
7
5
-5
-7
-∞
10
5
-10
-7
10
-10
-∞
-7
-7
-10
5
5
-7
--- # Minimax
image/svg+xml
0
1
2
3
4
+∞
10
5
-10
7
5
-5
-7
-∞
10
5
-10
-7
10
-10
-∞
-7
-7
-10
5
5
-7
--- # Minimax Let's take a game where we "build" a binary number by choosing bits. The number starts with a 1, and each player can choose the next bit in order. The game ends when the number has 6 digits in total (after 5 choices), or if the same bit was chosen twice in a row. If the resulting number is even or prime, we get points equal to the number, otherwise the other player gets that many points. We want to know: What is our best first move assuming the other player plays optimally. --- # Alpha-beta Pruning
image/svg+xml
6
8
9
5
7
9
6
6
3
2
4
7
6
5
6
8
5
7
6
6
3
4
5
8
5
7
6
3
5
5
6
3
6
MAX
MIN
MAX
MIN
MAX
--- class: small # Alpha-beta Pruning * For the max player: Remember the minimum score they will reach in nodes that were already evaluated (alpha) * For the min player: Remember the maximum score they will reach in nodes that were already evaluated (beta) * If beta is less than alpha, stop evaluating the subtree * Example: If the max player can reach 5 points by choosing the left subtree, and the min player finds an action in the right subtree that results in 4 points, they can stop searching. * If the right subtree was reached, the min player could choose the action that results in 4 points, therefore the max player will never choose the right subtree, because they can get 5 points in the left one --- class: medium # Minimax: Limitations * The tree for our mini game was quite large * Imagine one for chess * Even with Alpha-Beta pruning it's impossible to evaluate all nodes * Use a guess! For example: Board value after 3 turns * What about unknown information (like a deck that is shuffled)? --- class: medium # Unknown Information * Let's say we have a shuffled deck * We don't quite know what our best action is * We can work around this! Shuffle the deck once, and build the game tree * Repeat "often" * We will get the expected value of each action (over the different possible decks) --- class: small # Monte Carlo Tree Search * One problem with this approach: Building the Game Tree is expensive, and now we are doing it "often" * Instead, we can just choose one plan "randomly" * Then we shuffle the deck again, and choose plan "randomly" * We then only repeat this process "often" * We will get a partial tree, with the value for each of the random plans in the different possible situations * We can "merge" plans with the same prefix to calculate an expected value for starting with a particular action --- class: small # Action Selection * Note how I always put "randomly" in quotes * We want the algorithm to explore different possibilities * But we also want to make sure that we explore more promising possibilities more often * Our random search will therefore be guided by the results we got so far, preferring actions that have already resulted in good outcomes * This way we make sure that we don't randomly encountered a good outcome that our opponent could easily thwart --- # Homework * [Homework 9](/PF-3335/assets/pdf/homework9.pdf) has been posted on the class website * There are 5 problems --- # References * [Continual Planning and Acting in Dynamic Multiagent Environments](ftp://ftp.informatik.uni-freiburg.de/papers/ki/brenner-nebel06.pdf) * [Planning Algorithms](http://planning.cs.uiuc.edu/bookbig.pdf) * [Building a Chess AI with Minimax](https://medium.freecodecamp.org/simple-chess-ai-step-by-step-1d55a9266977) * [Monte Carlo Tree Search](https://pdfs.semanticscholar.org/574e/6872df3fe9b89afa98a7bdeef710a931da34.pdf)