class: center, middle # AI in Digital Entertainment ### Optimization --- # Artificial Intelligence Remember: Everything in AI is either representation or search * Behavior Trees, State Machines, etc.: Representation * Planning: Search (Plan-Space planning uses a different representation) * Belief modeling: Representation improvements * Monte Carlo Tree Search: Search --- class: mmedium # Optimization * Often in search we want to find something called a "solution" * However, what if we have multiple possible solutions, with different levels of quality? * We want to find the "best" solution among all possible ones * In other words, instead of constructing partial solutions until we find a complete one, we look at the set of complete solutions and try to find the best among them * For example: Doing the most damage, earning the most money, getting the highest score, driving the race in the lowest possible time, etc. --- # The Optimization Problem Given a function (usually called "fitness function"): $$ f: \mathbb{R}^n \mapsto \mathbb{R} $$ find the `\(\vec{x} \in \mathbb{R}^n \)` for which `\(f(\vec{x})\)` is minimal (or maximal). -- First idea: $$ \frac{\text{d}}{\text{d}x}f(\vec{x}) = 0 $$ --- # Challenges * Setting the derivative to zero only finds *a* minimum/maximum, but not necessarily the global minimum/maximum * The dimensionality of the vector may be really high, giving us many options to consider * Who says the derivative can even be calculated? * We haven't even specified what our vectors represent, and what f looks like --- class: medium # Minimization vs. Maximization * For the rest of this talk, we will assume that we want to minimize our functions * For some values, such as score, this does not really make sense, of course * We can turn a maximization problem into a minimization one by simply negating the value (or subtracting it from the theoretical maximum) * For example: If the maximum possible score is 100000, to maximize the score we want to *minimize* the distance to that value --- class: center, middle # Representation --- # Representation * We said our function takes vectors as real numbers as input, but what do these vectors represent? * They can be geometrical, such as the angle and force used to shoot a bird in Angry Birds * It could be an assignment of values to resources, such as how many of each unit to build in Starcraft * Another possibility is to view the vector as a sequence of actions --- # Geometric Interpretation
--- # Differentiability To be differentiable, a function needs to be continuous: $$ \forall \varepsilon \gt 0\: \exists \delta \gt 0\: \forall \vec{a}:\\\\ (|\vec{a} - \vec{x}| \lt \delta \rightarrow |f(\vec{a}) - f(\vec{x})| \lt \varepsilon) $$ Or, in words: If you change the input a little bit, the output also only changes a little bit. --- # Differentiability? Is the function that maps from angle and strength to score differentiable?
--- class: center, middle # Gradient Descent --- # Gradient Descent * For now, let's say our function *is* differentiable * We know that the derivative is zero at the extrema * We also know that it defines the slope * So if we start at any point, we can just go "downhill" --- # Gradient Descent * Start with an initial guess `\(\vec{x}_0\)` * Repeat until "convergence": $$ \vec{x}_{i+1} = \vec{x}_i + \alpha\cdot \frac{\text{d}}{\text{d}\vec{x}} f(\vec{x}_i) $$ * `\(\alpha\)` is the *learning rate* --- # Gradient Descent
--- # Gradient Descent
--- # Gradient Descent: Some improvements * Scale the learning rate over time to move quickly at first, and slow down as a solution is found * Start at multiple different randomly chosen starting locations to avoid being stuck in a local minimum * Use the learning rate as momentum to get out of small minima * Add a small random number to the gradient to explore other parts of the function --- class: center, middle # Evolutionary Algorithms --- class: medium # Evolutionary Algorithms * What if our function is not differentiable? * Or our vectors don't have a good geometric interpretation, but are more just "collections of numbers" * Let's look to nature: Genes and evolution * Each individual is made up of genes * New individuals inherit a mixture of genes from their parents, strongly preferring "better" genes * "Survival of the Fittest" --- class: medium # Evolutionary Algorithms * We can interpret a vector as the "genes" of an individual * The fitness function then tells us how "good" these genes are * By modifying the genes we can create new individuals, to find the "best" genes as measured by our fitness function --- class: center, middle # Genetic Algorithm --- # Genetic Algorithm * We generate a (random) set of individuals as a population * Every step we create some new individuals as combinations and/or mutations of existing ones * Then we keep only the "most fit" individuals, i.e. the ones for which our scoring function returns the lowest values * Repeat for "a number of steps" --- # Population * Our population is a set of vectors * Each of these vectors is associated with its fitness * We limit our population to a certain size by only keeping the n "best" individuals --- # Combination and Mutation * Two individuals can be combined by selecting elements from each of their two vectors (or averaging, adding, etc.), called "crossover" * An individual can also be changed by modifying single values in its vector, called "mutation" * If we view our vector elements as containing a binary representation of numbers, we could also use binary operations such as an xor to combine two values --- # Why does this work? * By keeping only the best individuals from one iteration to the next we guarantee that the solution will never get worse over time * Using combinations of two individuals basically tries to combine the advantages of the two, while removing the drawbacks * By having a number of different individuals we avoid getting stuck in local minima --- # Repair * One problem is that in many domains not every *possible* vector is also feasible * For example: If our vector represents actions, jumping while already in the air might not be possible * *Repair* is the mechanism by which an infeasible individual is converted into a feasible one * How this repair works is domain dependent --- class: center, middle # Swarm-Based Optimization --- # Swarm-Based Optimization * Another approach is to keep a set of individuals and "move" them around in space * The individuals can use information about the other individuals, or the "swarm" as a whole, to determine where to go * There are several approaches to how this information exchange works * Typically the individuals are initially spread out over the search space and then follow the path to the best known values --- class: medium # Particle-Swarm Optimization * In Particle Swarm Optimization, each individual is called a *particle*, and has a position and velocity * Every step, the position is updated with the velocity, and the velocity is changed depending on (using semi-random ratios): - The best position found by the swarm as a whole - The best position found by the particle itself * If the best position found by the swarm as a whole is far away from the particle in question, it will move more quickly --- # Particle-Swarm Optimization
--- class: medium # Ant-Colony Optimization * When ants look for food, they start by walking around randomly * Wherever they walk, they leave pheromones behind * If an ant encounters a pheromone trail by another ant, it has a chance to follow it * Pheromones evaporate over time, meaning that shorter paths will have a higher concentration of pheromones, because they are traversed faster --- # Ant-Colony Optimization
1
2
3
--- class: medium # Ant-Colony Optimization * Often described as operating on graphs, with pheromone trails assigned to the edges * However, it can also be used for general optimization problems * Generate a set of ants randomly placed in the search space * Move ants around and update their trails with the best fitness value they encounter * If an ant encounters another pheromone trail, it has a chance of following that trail, depending on the strength of the pheromone trail --- class: center, middle # Function Approximation --- class: medium # Function Approximation * For optimization we need a search space of vectors and a fitness function * However, calculating the fitness for arbitrary input vectors may be expensive or even impossible * For example: We may be able to calculate the strength of a collection of troops in Starcraft, but determining its expected win rate requires extensive simulations * Idea: Estimate the fitness values * Iff we have fitness values for *some* inputs, we can calculate a guess for the intermediate values --- # Interpolation * Say we know the win rate for having 40 Space Marines and 10 Fire Bats is 80%, and the win rate for having 10 Space Marines and 40 Fire Bats is 20% * What is the win rate for 25 Space Marines and 25 Fire Bats? * We can guess 50%, using Linear Interpolation * Our fitness functions are not usually linear, though --- class: medium # Interpolation * It turns out that the win rate for 25 Space Marines and 25 Fire Bats is actually 90% * Instead of trying to explain this linearly, we can use some polynomial * For n observations we would need a polynomial of degree n-1 * Better idea: Define each point to be a linear mixture of some non-linear function of the observed values, depending on the distance from those values * Radial Basis Functions (RBF)! --- class: small # Radial Basis Function Example .left-column[
] .right-column[ * f(40,10) = 0.8, f(10, 40) = 0.2, f(25,25) = 0.5 * We use a Gauss function as the RBF * To determine and estimate for f(20,30) we calculate ] $$ f(20,30) \approx f(40,10)\cdot \varphi(|(40,10) - (20,30)|) + \\\\ f(10,40)\cdot \varphi(|(10,40) - (20,30)|) + \\\\ f(25,25)\cdot \varphi(|(25,25) - (20,30)|) $$ --- class: medium # Function Approximation with Neural Networks * Radial Basis Functions are neat, because they work even with a large number of known values, and don't require the known information to be layed out in any particular form * However, the parameter of the function that determines its "flatness" needs tuning * Next week we will talk about Neural Networks, which provide an alternative way to approximate function where these parameters are learned from input data --- class: center, middle # Multi-Objective Optimization and Pareto Optimal Solutions --- # Multi-Objective Optimization * Sometimes we don't want to just optimize *one* value, but multiple * Or the optimization of one value comes at the expense of another * For example: Building units in a strategy game that do the most damage * The most expensive units do the most damage * That does not mean that building the most expensive units is the best strategy. There is a trade-off between cost and army strength --- # Multi-Objective Optimization: Approaches * Maybe we can try to optimize the (weighted) sum of both values? * Or how about a ratio? Let's optimize the army strength *per* resource spent * Both of these may run into degenerate solutions. For example, spending 0 resources results in infinite strength per resource * Better idea: Let the AI agent decide about the trade-off --- # Pareto-Optimal Solutions * Say we have two values, x and y, to maximize * If we have a solution with fitness (x,y), any other solution for which the fitness for x is higher **and** the fitness for y is higher, is an objectively worse solution. The first solution is said to *dominate* the other * However, solutions for which x is higher *or* y is higher can be considered of the "same" utility * The result will be a frontier of fitness values --- # Pareto-Optimal Solutions
C
Pareto
A
B
x(A) < x(B)
y
x
y(A) > y(B)
--- # Next Week * [Build order optimization in Starcraft](https://www.aaai.org/ocs/index.php/AIIDE/AIIDE11/paper/viewFile/4078/4407)