Lecture 7: Optimization

# AI in Digital Entertainment

### Optimization

---

# Artificial Intelligence

Remember: Everything in AI is either representation or search

* Behavior Trees, State Machines, etc.: Representation
   
   * Planning: Search (Plan-Space planning uses a different representation)

* Belief modeling: Representation improvements

* Monte Carlo Tree Search: Search

---

# Optimization

* Often in search we want to find something called a "solution"
  
  * However, what if we have multiple possible solutions, with different levels of quality?
  
  * We want to find the "best" solution among all possible ones
  
  * In other words, instead of constructing partial solutions until we find a complete one, we look at the set of complete solutions and try to find the best among them
  
  * For example: Doing the most damage, earning the most money, getting the highest score, driving the race in the lowest possible time, etc.
  
---

# The Optimization Problem

Given a function (usually called "fitness function"):

$$
f: \mathbb{R}^n \mapsto \mathbb{R}
$$

find the `$\vec{x} \in \mathbb{R}^n $` for which `$f(\vec{x})$` is minimal (or maximal).

First idea:

$$
\frac{\text{d}}{\text{d}x}f(\vec{x}) = 0
$$

---

# Challenges

* Setting the derivative to zero only finds *a* minimum/maximum, but not necessarily the global minimum/maximum 
  
  * The dimensionality of the vector may be really high, giving us many options to consider
  
  * Who says the derivative can even be calculated?
  
  * We haven't even specified what our vectors represent, and what f looks like
  
---

# Minimization vs. Maximization

* For the rest of this talk, we will assume that we want to minimize our functions 
  
  * For some values, such as score, this does not really make sense, of course
  
  * We can turn a maximization problem into a minimization one by simply negating the value (or subtracting it from the theoretical maximum)
  
  * For example: If the maximum possible score is 100000, to maximize the score we want to *minimize* the distance to that value
  
---

# Representation

---

# Representation

* We said our function takes vectors as real numbers as input, but what do these vectors represent?
  
  * They can be geometrical, such as the angle and force used to shoot a bird in Angry Birds 
  
  * It could be an assignment of values to resources, such as how many of each unit to build in Starcraft
  
  * Another possibility is to view the vector as a sequence of actions
  
---

# Geometric Interpretation

---

# Differentiability

To be differentiable, a function needs to be continuous:

$$
\forall \varepsilon \gt 0\: \exists \delta \gt 0\: \forall \vec{a}:\\\\ (|\vec{a} - \vec{x}| \lt \delta \rightarrow |f(\vec{a}) - f(\vec{x})| \lt \varepsilon)
$$

Or, in words:

If you change the input a little bit, the output also only changes a little bit.

---

# Differentiability?

Is the function that maps from angle and strength to score differentiable?

---

# Gradient Descent

---

# Gradient Descent

* For now, let's say our function *is* differentiable
  
  * We know that the derivative is zero at the extrema
  
  * We also know that it defines the slope
  
  * So if we start at any point, we can just go "downhill"
  
---

# Gradient Descent

* Start with an initial guess `$\vec{x}_0$`
  
  * Repeat until "convergence":
  
$$
\vec{x}_{i+1} = \vec{x}_i + \alpha\cdot \frac{\text{d}}{\text{d}\vec{x}} f(\vec{x}_i)
$$

* `$\alpha$` is the *learning rate*
  
---

# Gradient Descent

---

# Gradient Descent

---

# Gradient Descent: Some improvements

* Scale the learning rate over time to move quickly at first, and slow down as a solution is found 
  
  * Start at multiple different randomly chosen starting locations to avoid being stuck in a local minimum 
  
  * Use the learning rate as momentum to get out of small minima 
  
  * Add a small random number to the gradient to explore other parts of the function

---

# Evolutionary Algorithms

---

# Evolutionary Algorithms

* What if our function is not differentiable?
  
  * Or our vectors don't have a good geometric interpretation, but are more just "collections of numbers"
  
  * Let's look to nature: Genes and evolution
  
  * Each individual is made up of genes
  
  * New individuals inherit a mixture of genes from their parents, strongly preferring "better" genes 
  
  * "Survival of the Fittest"
  
---

# Evolutionary Algorithms

* We can interpret a vector as the "genes" of an individual
  
  * The fitness function then tells us how "good" these genes are 
  
  * By modifying the genes we can create new individuals, to find the "best" genes as measured by our fitness function

---

# Genetic Algorithm

---

# Genetic Algorithm

* We generate a (random) set of individuals as a population
  
  * Every step we create some new individuals as combinations and/or mutations of existing ones 
  
  * Then we keep only the "most fit" individuals, i.e. the ones for which our scoring function returns the lowest values 
  
  * Repeat for "a number of steps"
  
---

# Population

* Our population is a set of vectors 
  
  * Each of these vectors is associated with its fitness 
  
  * We limit our population to a certain size by only keeping the n "best" individuals 
  
---

# Combination and Mutation

* Two individuals can be combined by selecting elements from each of their two vectors (or averaging, adding, etc.), called "crossover"
  
  * An individual can also be changed by modifying single values in its vector, called "mutation"
  
  * If we view our vector elements as containing a binary representation of numbers, we could also use binary operations such as an xor to combine two values
  
---

# Why does this work?

* By keeping only the best individuals from one iteration to the next we guarantee that the solution will never get worse over time 
  
  * Using combinations of two individuals basically tries to combine the advantages of the two, while removing the drawbacks
  
  * By having a number of different individuals we avoid getting stuck in local minima
  
---

# Repair

* One problem is that in many domains not every *possible* vector is also feasible 
  
  * For example: If our vector represents actions, jumping while already in the air might not be possible 
  
  * *Repair* is the mechanism by which an infeasible individual is converted into a feasible one 
  
  * How this repair works is domain dependent

---

# Swarm-Based Optimization

---

# Swarm-Based Optimization

* Another approach is to keep a set of individuals and "move" them around in space 
  
  * The individuals can use information about the other individuals, or the "swarm" as a whole, to determine where to go 
  
  * There are several approaches to how this information exchange works
  
  * Typically the individuals are initially spread out over the search space and then follow the path to the best known values
  
---

# Particle-Swarm Optimization

* In Particle Swarm Optimization, each individual is called a *particle*, and has a position and velocity 
   
   * Every step, the position is updated with the velocity, and the velocity is changed depending on (using semi-random ratios):
   
      - The best position found by the swarm as a whole 
      - The best position found by the particle itself
      
   * If the best position found by the swarm as a whole is far away from the particle in question, it will move more quickly
  
---

# Particle-Swarm Optimization

---

# Ant-Colony Optimization

* When ants look for food, they start by walking around randomly
   
   * Wherever they walk, they leave pheromones behind 
   
   * If an ant encounters a pheromone trail by another ant, it has a chance to follow it 
   
   * Pheromones evaporate over time, meaning that shorter paths will have a higher concentration of pheromones, because they are traversed faster 
   
---

# Ant-Colony Optimization

---

# Ant-Colony Optimization

* Often described as operating on graphs, with pheromone trails assigned to the edges
  
  * However, it can also be used for general optimization problems 
  
  * Generate a set of ants randomly placed in the search space 
  
  * Move ants around and update their trails with the best fitness value they encounter
  
  * If an ant encounters another pheromone trail, it has a chance of following that trail, depending on the strength of the pheromone trail
  
---

# Function Approximation

---

# Function Approximation

* For optimization we need a search space of vectors and a fitness function 
  
  * However, calculating the fitness for arbitrary input vectors may be expensive or even impossible
  
  * For example: We may be able to calculate the strength of a collection of troops in Starcraft, but determining its expected win rate requires extensive simulations 
  
  * Idea: Estimate the fitness values 
  
  * Iff we have fitness values for *some* inputs, we can calculate a guess for the intermediate values 
  
---

# Interpolation

* Say we know the win rate for having 40 Space Marines and 10 Fire Bats is 80%, and the win rate for having 10 Space Marines and 40 Fire Bats is 20%
  
  * What is the win rate for 25 Space Marines and 25 Fire Bats?
  
  * We can guess 50%, using Linear Interpolation 
  
  * Our fitness functions are not usually linear, though 
  
---

# Interpolation

* It turns out that the win rate for 25 Space Marines and 25 Fire Bats is actually 90%
  
  * Instead of trying to explain this linearly, we can use some polynomial
  
  * For n observations we would need a polynomial of degree n-1
  
  * Better idea: Define each point to be a linear mixture of some non-linear function of the observed values, depending on the distance from those values 
  
  * Radial Basis Functions (RBF)!
  
---

# Radial Basis Function Example

* f(40,10) = 0.8, f(10, 40) = 0.2, f(25,25) = 0.5
  
* We use a Gauss function as the RBF
  
* To determine and estimate for f(20,30) we calculate
  ]
  
$$
f(20,30) \approx f(40,10)\cdot \varphi(|(40,10) - (20,30)|) + \\\\
 f(10,40)\cdot \varphi(|(10,40) - (20,30)|) + \\\\
 f(25,25)\cdot \varphi(|(25,25) - (20,30)|) 
$$

---

# Function Approximation with Neural Networks

* Radial Basis Functions are neat, because they work even with a large number of known values, and don't require the known information to be layed out in any particular form
  
  * However, the parameter of the function that determines its "flatness" needs tuning
  
  * Next week we will talk about Neural Networks, which provide an alternative way to approximate function where these parameters are learned from input data

---

# Multi-Objective Optimization and Pareto Optimal Solutions

---

# Multi-Objective Optimization

* Sometimes we don't want to just optimize *one* value, but multiple
  
  * Or the optimization of one value comes at the expense of another 
  
  * For example: Building units in a strategy game that do the most damage 
  
  * The most expensive units do the most damage 
  
  * That does not mean that building the most expensive units is the best strategy. There is a trade-off between cost and army strength
  
---

# Multi-Objective Optimization: Approaches

* Maybe we can try to optimize the (weighted) sum of both values?
  
  * Or how about a ratio? Let's optimize the army strength *per* resource spent 
  
  * Both of these may run into degenerate solutions. For example, spending 0 resources results in infinite strength per resource 
  
  * Better idea: Let the AI agent decide about the trade-off 
  
---

# Pareto-Optimal Solutions

* Say we have two values, x and y, to maximize
  
  * If we have a solution with fitness (x,y), any other solution for which the fitness for x is higher **and** the fitness for y is higher, is an objectively worse solution. The first solution is said to *dominate* the other
  
  * However, solutions for which x is higher *or* y is higher can be considered of the "same" utility
  
  * The result will be a frontier of fitness values
  
---

# Pareto-Optimal Solutions

---

# Next Week

* [Build order optimization in Starcraft](https://www.aaai.org/ocs/index.php/AIIDE/AIIDE11/paper/viewFile/4078/4407)