An agent is an entity that perceives its environment and acts on it
Some/Many people frown upon saying that something is "an AI" and prefer the term "agent"
Agents come in many different forms
Performance: How do we measure the quality of the agent (e.g. score, player enjoyments)
Environment: What surroundings is the agent located in (for us typically a game, but which part of the game)
Actuators: Which actions can the agent perform (e.g. move, shoot a fireball, ...)
Sensors: How does the agent perceive the world (in games we typically give it access to some data structures representing the game, but some researchers work on playing games using screen captures)
Say you have some NPC character in your game that should be controlled by AI
Your game typically contains some main loop that updates all game objects and renders them
At some points you run an AI update
This means, all our agents receive one "update" call every x ms, and this update call has to make the necessary decisions
Simplest approach: On each update, the agent reads the sensor values calculates the which actuators to use based on these values
Valentino Braitenberg proposed a thought experiment with simple two-wheel vehicles
The vehicles had two light sensors, and there was a light in the room
Each of the two sensors would be connected to one of the wheels
Depending on how this was done, the vehicle would seek or flee from the light
The behavior of the agent is fully reactive, with no memory
Performance: How much damage it can do to the player?
Environment: A dungeon in the game
Actuators: Rotate, move forward, hit
Sensors: Player position (With that we can compute distance and angle to the player)
If the angle to the player is greater than 0, turn left
(else) If the angle to the player is less than 0, turn right
(else) If the distance to the player is greater than 0, move forward
(else) Hit the player
This is, of course, a very simple agent
Imagine if there were walls
What if we want the enemy to have different modes of engagement, flee when it is in danger, etc.?
How did we even come up with these conditions?
How could we make this a bit friendlier to edit?
We haven't actually changed anything from the if statements (other than drawing them)
Designing a decision tree is still a lot of manual work
There's also no persistence, the agent will decide a new behavior every time the tree is evaluated
There is one nice thing: Decision trees can (sometimes) be learned with Machine Learning techniques
Say we want our enemy to attack more aggressively if they have a lot of health and try to flee when they become wounded
In other words: The enemy has a state that determines what they do, in addition to their inputs and outputs
But we'll need new sensors: The enemy needs to know their own health level
Let's also give them a ranged weapon
States represent what the agent is currently supposed to do
Each state is associated with actions the agent should perform in that state
Transitions between the states observe the sensors and change the state when a condition is met
The agent starts in some designated state, and can only be in one state at a time
There's no real concept of "time", it has to be "added"
If you just want to add one state you have to determine how it relates to every other state
If you have two Finite State Machines they are hard to compose
It's also kind of hard to reuse subparts
For example: The parts of our state machine that is used to engage an enemy at range could be useful for an archer guard on a wall, but how do we take just those parts?
Finite State Machines define the behavior of the agent
But we said the nodes are behaviors?!
We can make each node another sub-machine!
This leads to some reusability, and eases authoring
Let's still use a graph, but make it a tree!
If we have a subtree, we now only need to worry about one connection: its parent
The leafs of the tree will be the actual actions, while the interior nodes define the decisions
Each node can either be successful or not, which is what the interior nodes use for the decisions
We can have different kinds of nodes for different kinds of decisions
This is extensible (new kinds of nodes), easily configurable (just attach different nodes together to make tree) and reusable (subtrees can be used multiple times)
Every AI time step the root node of the tree is executed
Each node saves its state:
When a node is executed, it executes its currently executing child
When a leaf node is executed and finishes, it returns success or failure to its parent
The parent then makes a decision based on this result
Choice/Selector: Execute children in order until one succeeds
Sequence: Execute children in order until one fails
Loop: Keep executing child (or children) until one fails
Random choice: Execute one of the children at random
etc.
Some actions are just "checks", they return success iff the check passes
A sequence consisting of a check and another node will only execute the second node if the check passes
If we put multiple such sequences as children of a choice, the first sequence with a passing condition will be executed
Behavior Trees are a very powerful technique and widely used in games
Halo 2, for example, used them
Unreal Engine has built-in support for Behavior Trees (there are plugins for Unity)
The tree structure usually allows for visual editing (which Unreal Engine also has built-in)
A graph G = (V,E) consists of vertices (nodes) V and edges (connections) E⊆V×V
Graphs can be connected, or have multiple components
Graphs can be directed (one-way streets) or undirected
Edges can have weights (costs) associated with them: w:E↦R
We can represent many things in graphs
Given a graph G = (V,E), with edge weights w, a start node s∈V
, a destination node d∈V
, find a sequence of vertices v1,v2,…,vn
, such that v1=s,vn=d
and ∀i:(vi,vi+1)∈E
We call the sequence v1,v2,…,vn
a path, and the cost of the path is ∑iw((vi,vi+1))
Given a graph G = (V,E), with edge weights w, a start node s∈V
, a destination node d∈V
, find a sequence of vertices v1,v2,…,vn
, such that v1=s,vn=d
and ∀i:(vi,vi+1)∈E
We call the sequence v1,v2,…,vn
a path, and the cost of the path is ∑iw((vi,vi+1))
This means what you would expect: To find a path from a start node to a destination node means to find vertices to walk through that lead from the start to the destination by being connected with edges. The cost is the sum of the costs of edges that need to be traversed.
The simplest pathfinding algorithm works like this:
How do you "keep track" of nodes?
What if we can give the path finding algorithm some more information?
For example, we may not know how to drive everywhere, but we can measure the straight line distance
This "extra" information is called a "heuristic"
Search algorithms can use it to "guide" the search process
We use the same algorithm as above:
Instead of using a stack or list, we use a priority queue, where the nodes are ordered according to some value derived from the heuristic
So how do we determine this value?
Let's use our heuristic!
We order the nodes in the priority queue by heuristic value
Heuristic: straight line distance to Bucharest
![]() |
Heuristic:
|
![]() |
Heuristic:
|
![]() |
Heuristic:
|
![]() |
Heuristic:
|
Greedy search sometimes does not give us the optimal result
It tries to get to the goal as fast as possible, but ignores the cost of actually getting to each node
Idea: Instead of using the node with the lowest heuristic value, use the node with the lowest sum of heuristic value and cost to get to
This is called A* search
![]() |
Heuristic:
|
![]() |
Heuristic:
|
![]() |
Heuristic:
|
![]() |
Heuristic:
|
![]() |
Heuristic:
|
![]() |
Heuristic:
|
To find optimal solution, keep expanding nodes until the goal node is the best node in the frontier
A* is actually guaranteed to find the optimal solution if the heuristic is:
You can also reduce the memory requirements of A* by using Iterative Deepening:
You may have heard of Dijkstra's algorithm (and its variants) before
Dijkstra's algorithm is basically A* without using the heuristic
In some popular formulations you also let the algorithm compute a path for every possible destination
This will give you a shortest path tree, which may be useful if you have to repeatedly find a path to different destinations
While we have looked at finding paths in physical spaces so far, there are many other applications
Take, for example, Super Mario
An AI could play the game using A*
A* is widely applied in games
Unity's built-in navigation module uses A*
But how do you apply A* to a 3D world?
We need a graph!
Idea: Divide the game world into regions, and assign each region a graph node
Sometimes we can just assign a numerical value ("score") to the observations, and then combine these scores in some way to get a decision
For example, we can assign a score to the distance from the player, the agent's health, maybe their remaining mana, etc.
Then, we can calculate a score for a melee attack by weighing the distance as more significant than the health, and mana being irrelevant
On the other hand, the score for a fireball would be more affected by the remaining mana and less by the distance (up to a threshold, perhaps)
The agent then simply picks the action with the highest score/utility
Three options: melee, fireball or run away
um=0.8⋅d+0.2⋅h+0⋅muf=0.4⋅d+0.2⋅h+0.4⋅mur=0.4⋅d+0.6⋅h+0⋅m
We are 80 units away from the player, have 90% health and 100% mana. What do we do?
Three options: melee, fireball or run away
um=0.8⋅d+0.2⋅h+0⋅muf=0.4⋅d+0.2⋅h+0.4⋅mur=0.4⋅d+0.6⋅h+0⋅m
We are 80 units away from the player, have 90% health and 100% mana. What do we do?
At which distance would we attack the player?
Three options: melee, fireball or run away
um=0.8⋅d+0.2⋅h+0⋅muf=0.4⋅d+0.2⋅h+0.4⋅mur=0.4⋅d+0.6⋅h+0⋅m
We are 80 units away from the player, have 90% health and 100% mana. What do we do?
At which distance would we attack the player?
We need to define the scores! Let's say
d=80distance+80,h=health100,m=mana100
The main advantage of this utility-based approach is that it is easy to extend
If a new action becomes available: assign a scoring function to it, and the agent will automatically consider it
If a new kind of observation becomes available: add it to the scoring functions where it is relevant
Drawback: The scaling of the utility scores needs to be consistent (often easiest achieved by normalizing them to be between 0 and 1)
Another drawback: Determining the formulas for each action/option is non-trivial, especially when they have many terms
A utility-based approach can also be used for pathfinding
Assign a utility value to each space in the game
The goal has (very) high utility
Obstacles have negative utility
Each of these utility values is actually a field of values
The total utility of the space is the sum of these fields
Potential (and Flow) Fields can be a very efficient way to find paths in large and complex environments
Local optima are a big problem. Potential solutions:
It is easy to combine them with strategic decision making: add more utility to higher-priority targets, add more negative utility to dangerous areas, etc.
Scaling and tweaking can still be challenging
Say we have StarCraft, a real-time strategy game
The AI agent controls a number of squads of different units
There are several possible targets for each squad to attack
Let's assign a utility value for each combination of squad and target!
Utility values can be zero (e.g. the squad and the target both die), or maybe even negative (trying to attack airborne Wraiths with ground-only Zerglings)
We want all targets to be attacked
To assign squads to targets, we calculate the utility of a particular assignment as the sum of all individual utilities
For example, if squad 1 attacking target 1 has utility 0.4, and squad 2 attacking target 2 has utility 0.1, the total utility is 0.5
We calculate these utilities for all possible assignments
Then we pick the assignment with the highest total utility
Instead of squads of units we have students
Instead of targets to attack we have papers to present
And y'all sent me the utility values ...
Papers were assigned to maximize total utility
u 1 1 = 1.0u 1 3 = 0.66u 1 6 = 0.33u 1 _ = -1.0utility :: [(Int,Int)] -> Doubleutility assignments = sum $ map (uncurry u) assignmentsmakeAssignment :: [Int] -> [Int] -> [(Int,Int)]makeAssignment students topics = maximumBy (comparing utility) assignments where assignments = map (zip students) $ permutations topics
(Source)
See any problem?
u 1 1 = 1.0u 1 3 = 0.66u 1 6 = 0.33u 1 _ = -1.0utility :: [(Int,Int)] -> Doubleutility assignments = sum $ map (uncurry u) assignmentsmakeAssignment :: [Int] -> [Int] -> [(Int,Int)]makeAssignment students topics = maximumBy (comparing utility) assignments where assignments = map (zip students) $ permutations topics
(Source)
See any problem? Here's a bad word for you: permutations
13 topics means 13! possible assignments
13! = 6 227 020 800
Even an optimized build of the assignment program takes a while to run
Several solutions:
So far we have only looked at making decisions based on our own plans
What if the other player can make several different decisions in response to our action?
For example, how can we play chess, accounting for what the opponent will do?
Adversarial search!
Let's say we want to get the highest possible score
Then our opponent wants us to get the lowest possible score
For each of our potential actions, we look at each of the opponents possible actions
The opponent will pick the action that gives us the lowest score, and we will pick from our actions the one where the opponent's choice gives us the highest score
How does the opponent decide what to pick? The same way!
Let's take a game where we "build" a binary number by choosing bits. The number starts with a 1, and each player can choose the next bit in order. The game ends when the number has 6 digits in total (after 5 choices), or if the same bit was chosen twice in a row. If the resulting number is even or prime, we get points equal to the number, otherwise the other player gets that many points. We want to know: What is our best first move assuming the other player plays optimally.
For the max player: Remember the minimum score they will reach in nodes that were already evaluated (alpha)
For the min player: Remember the maximum score they will reach in nodes that were already evaluated (beta)
If beta is less than alpha, stop evaluating the subtree
Example: If the max player can reach 5 points by choosing the left subtree, and the min player finds an action in the right subtree that results in 4 points, they can stop searching.
If the right subtree was reached, the min player could choose the action that results in 4 points, therefore the max player will never choose the right subtree, because they can get 5 points in the left one
The tree for our mini game was quite large
Imagine one for chess
Even with Alpha-Beta pruning it's impossible to evaluate all nodes
Use a guess! For example: Board value after 3 turns
What about unknown information (like a deck that is shuffled)?
Introduction: What is the problem and why is it relevant?
Related Work: What have other people done that is related to the work discussed, and why do their approaches not solve the problem at hand?
Approach/Methodology: How do the authors solve the problem?
Result: How can we be sure that the proposed approach actually solves the problem?
Conclusion: What are the limitations of the proposed work, and how could it be expanded upon in the future?
When reading a paper, determine the answers to the questions that should be answered in each section
While the technical details can be interesting, your main focus should be on understanding the problem and the idea behind the solution
Also challenge any assumptions the authors may have made to determine if they have actually solved the problem
However, also note the good ideas in the paper
Never just assume that a problem is "not important"
When was the paper written? What were the available computational resources at the time?
Who wrote the paper? What is their expertise?
Where was the paper published?
Strong General AI-focused conferences:
Strong Game AI-focused conferences:
Strong Games-focused conferences:
Some popular academic workshops for game AI research:
Industry-focused publication venues
Universities
First step: Read and understand the paper
Make sure your presentation includes the important parts:
Don't get bogged down in too much detail
Avoid formulas on slides, unless they are central to the paper
After your presentation you should also lead the discussion
What did we like about the paper?
Which application can we see for the technique?
Are there any assumptions the authors made that may need a second look?
Are there any problems with the experiment?
What are the limitations of the approach?
How can the work be expanded upon?
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |