Remember: Everything in AI is either representation or search
Behavior Trees, State Machines, etc.: Representation
Planning: Search (Plan-Space planning uses a different representation)
Belief modeling: Representation improvements
Monte Carlo Tree Search: Search
Often in search we want to find something called a "solution"
However, what if we have multiple possible solutions, with different levels of quality?
We want to find the "best" solution among all possible ones
In other words, instead of constructing partial solutions until we find a complete one, we look at the set of complete solutions and try to find the best among them
For example: Doing the most damage, earning the most money, getting the highest score, driving the race in the lowest possible time, etc.
Given a function (usually called "fitness function"):
f:Rn↦R
find the →x∈Rn
for which f(→x)
is minimal (or maximal).
Given a function (usually called "fitness function"):
f:Rn↦R
find the →x∈Rn
for which f(→x)
is minimal (or maximal).
First idea:
ddxf(→x)=0
Setting the derivative to zero only finds a minimum/maximum, but not necessarily the global minimum/maximum
The dimensionality of the vector may be really high, giving us many options to consider
Who says the derivative can even be calculated?
We haven't even specified what our vectors represent, and what f looks like
For the rest of this talk, we will assume that we want to minimize our functions
For some values, such as score, this does not really make sense, of course
We can turn a maximization problem into a minimization one by simply negating the value (or subtracting it from the theoretical maximum)
For example: If the maximum possible score is 100000, to maximize the score we want to minimize the distance to that value
We said our function takes vectors as real numbers as input, but what do these vectors represent?
They can be geometrical, such as the angle and force used to shoot a bird in Angry Birds
It could be an assignment of values to resources, such as how many of each unit to build in Starcraft
Another possibility is to view the vector as a sequence of actions
To be differentiable, a function needs to be continuous:
∀ε>0∃δ>0∀→a:(|→a−→x|<δ→|f(→a)−f(→x)|<ε)
Or, in words:
If you change the input a little bit, the output also only changes a little bit.
Is the function that maps from angle and strength to score differentiable?
For now, let's say our function is differentiable
We know that the derivative is zero at the extrema
We also know that it defines the slope
So if we start at any point, we can just go "downhill"
Start with an initial guess →x0
Repeat until "convergence":
→xi+1=→xi+α⋅dd→xf(→xi)
α
is the learning rateScale the learning rate over time to move quickly at first, and slow down as a solution is found
Start at multiple different randomly chosen starting locations to avoid being stuck in a local minimum
Use the learning rate as momentum to get out of small minima
Add a small random number to the gradient to explore other parts of the function
What if our function is not differentiable?
Or our vectors don't have a good geometric interpretation, but are more just "collections of numbers"
Let's look to nature: Genes and evolution
Each individual is made up of genes
New individuals inherit a mixture of genes from their parents, strongly preferring "better" genes
"Survival of the Fittest"
We can interpret a vector as the "genes" of an individual
The fitness function then tells us how "good" these genes are
By modifying the genes we can create new individuals, to find the "best" genes as measured by our fitness function
We generate a (random) set of individuals as a population
Every step we create some new individuals as combinations and/or mutations of existing ones
Then we keep only the "most fit" individuals, i.e. the ones for which our scoring function returns the lowest values
Repeat for "a number of steps"
Our population is a set of vectors
Each of these vectors is associated with its fitness
We limit our population to a certain size by only keeping the n "best" individuals
Two individuals can be combined by selecting elements from each of their two vectors (or averaging, adding, etc.), called "crossover"
An individual can also be changed by modifying single values in its vector, called "mutation"
If we view our vector elements as containing a binary representation of numbers, we could also use binary operations such as an xor to combine two values
By keeping only the best individuals from one iteration to the next we guarantee that the solution will never get worse over time
Using combinations of two individuals basically tries to combine the advantages of the two, while removing the drawbacks
By having a number of different individuals we avoid getting stuck in local minima
One problem is that in many domains not every possible vector is also feasible
For example: If our vector represents actions, jumping while already in the air might not be possible
Repair is the mechanism by which an infeasible individual is converted into a feasible one
How this repair works is domain dependent
Another approach is to keep a set of individuals and "move" them around in space
The individuals can use information about the other individuals, or the "swarm" as a whole, to determine where to go
There are several approaches to how this information exchange works
Typically the individuals are initially spread out over the search space and then follow the path to the best known values
In Particle Swarm Optimization, each individual is called a particle, and has a position and velocity
Every step, the position is updated with the velocity, and the velocity is changed depending on (using semi-random ratios):
If the best position found by the swarm as a whole is far away from the particle in question, it will move more quickly
When ants look for food, they start by walking around randomly
Wherever they walk, they leave pheromones behind
If an ant encounters a pheromone trail by another ant, it has a chance to follow it
Pheromones evaporate over time, meaning that shorter paths will have a higher concentration of pheromones, because they are traversed faster
Often described as operating on graphs, with pheromone trails assigned to the edges
However, it can also be used for general optimization problems
Generate a set of ants randomly placed in the search space
Move ants around and update their trails with the best fitness value they encounter
If an ant encounters another pheromone trail, it has a chance of following that trail, depending on the strength of the pheromone trail
For optimization we need a search space of vectors and a fitness function
However, calculating the fitness for arbitrary input vectors may be expensive or even impossible
For example: We may be able to calculate the strength of a collection of troops in Starcraft, but determining its expected win rate requires extensive simulations
Idea: Estimate the fitness values
Iff we have fitness values for some inputs, we can calculate a guess for the intermediate values
Say we know the win rate for having 40 Space Marines and 10 Fire Bats is 80%, and the win rate for having 10 Space Marines and 40 Fire Bats is 20%
What is the win rate for 25 Space Marines and 25 Fire Bats?
We can guess 50%, using Linear Interpolation
Our fitness functions are not usually linear, though
It turns out that the win rate for 25 Space Marines and 25 Fire Bats is actually 90%
Instead of trying to explain this linearly, we can use some polynomial
For n observations we would need a polynomial of degree n-1
Better idea: Define each point to be a linear mixture of some non-linear function of the observed values, depending on the distance from those values
Radial Basis Functions (RBF)!
f(40,10) = 0.8, f(10, 40) = 0.2, f(25,25) = 0.5
We use a Gauss function as the RBF
To determine and estimate for f(20,30) we calculate
f(20,30)≈f(40,10)⋅φ(|(40,10)−(20,30)|)+f(10,40)⋅φ(|(10,40)−(20,30)|)+f(25,25)⋅φ(|(25,25)−(20,30)|)
Radial Basis Functions are neat, because they work even with a large number of known values, and don't require the known information to be layed out in any particular form
However, the parameter of the function that determines its "flatness" needs tuning
Next week we will talk about Neural Networks, which provide an alternative way to approximate function where these parameters are learned from input data
Sometimes we don't want to just optimize one value, but multiple
Or the optimization of one value comes at the expense of another
For example: Building units in a strategy game that do the most damage
The most expensive units do the most damage
That does not mean that building the most expensive units is the best strategy. There is a trade-off between cost and army strength
Maybe we can try to optimize the (weighted) sum of both values?
Or how about a ratio? Let's optimize the army strength per resource spent
Both of these may run into degenerate solutions. For example, spending 0 resources results in infinite strength per resource
Better idea: Let the AI agent decide about the trade-off
Say we have two values, x and y, to maximize
If we have a solution with fitness (x,y), any other solution for which the fitness for x is higher and the fitness for y is higher, is an objectively worse solution. The first solution is said to dominate the other
However, solutions for which x is higher or y is higher can be considered of the "same" utility
The result will be a frontier of fitness values
Remember: Everything in AI is either representation or search
Behavior Trees, State Machines, etc.: Representation
Planning: Search (Plan-Space planning uses a different representation)
Belief modeling: Representation improvements
Monte Carlo Tree Search: Search
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |