Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

AI in Digital Entertainment

Unsupervised Learning

1 / 50

Unsupervised Learning

2 / 50

Unsupervised Learning

  • In Unsupervised Learning we give our algorithm data, and let it find "something interesting"

  • Examples:

    • Clusters of similar data: Similar players, cliques in social networks, similar sounding words
    • Common patterns of data: Attack or other action sequences used by many players
    • Related data items: Purchases often made together (with real or in-game currency), quests often chosen together, games played by the same people
3 / 50

Why Unsupervised Learning for Games?

  • If we can find similar players, we can make them play together

  • Friends can be recommended based on player type/preference

  • Help new players by suggesting common actions/purchases made by similar players

  • Recommend new games to players

4 / 50

Clustering

5 / 50

Clustering

  • We are given n vectors, representing our players/games/words/...

  • How can we determine which vectors belong to the same "class"/"type"?

  • How many classes are there?

  • We call the classes clusters

6 / 50

What is a Cluster?

  • For now, we assume that each of our clusters is defined by a single point, the center of the cluster

  • Each data point is then assigned to a cluster based on which cluster center it is closest to

7 / 50

What is a good Clustering?

  • Say we are told that we should create k "good" clusters

    • k-center clustering: Minimize the maximum distance of any data point from its cluster center (firehouse placement)

    • k-median clustering: Minimize the sum of the distances of data points to their cluster center

    • k-means clustering: Minimize the variance of distances of data points within a cluster (which is the average squared distance from the mean)

  • Each of these is a measure for how "compact" a cluster is, but that does not necessarily tell us anything about cluster "usefulness", which is application-dependent

8 / 50

k-Means Clustering

  • k-means clustering puts more weight on outliers than k-median, but is not dominated by them like k-center

  • Especially for d-dimensional vectors, k-means is usually the first choice

  • How do we find a k-means clustering? Try all possible assignments of data points to clusters

  • Finding an optimal clustering is NP-hard :(

  • Lloyd's algorithm! (Often also just called "k-means algorithm")

9 / 50

Lloyd's algorithm

  • Determine k initial cluster centers, then iterate these two steps:

    • Assign each data point to its cluster based on the current centers

    • Compute new centers as the mean of each cluster

  • After "some" iterations we will have a clustering of the data

  • This may be a local minimum when compared to the k-means criterion, but is often "good enough"

10 / 50

Lloyd's algorithm

11 / 50

Lloyd's algorithm

12 / 50

Lloyd's algorithm

13 / 50

Lloyd's algorithm

14 / 50

Lloyd's algorithm

15 / 50

Lloyd's algorithm

16 / 50

Lloyd's algorithm

17 / 50

Lloyd's algorithm

18 / 50

Lloyd's algorithm

19 / 50

Lloyd's algorithm

20 / 50

Lloyd's algorithm

21 / 50

Lloyd's algorithm

22 / 50

Lloyd's algorithm

23 / 50

Lloyd's algorithm

24 / 50

Lloyd's algorithm

25 / 50

Generalized Distance Functions

  • What if our data is not d-dimensional vectors, but e.g. all the data we have about each player

  • For any two players, we can calculate a distance, but we can't make up an "average value"

  • In other words, all we have are our data points and pairwise distances, but no vector embedding

  • We can still cluster, we just have the restriction that each cluster center must be exactly on a data point

26 / 50

k-Medoids Clustering

  • The cluster centers are called "medoids"

  • We use a variation of Lloyd's algorithm

  • The only difference is how we assign new cluster centers

  • One option: Use the data point that has the lowest sum of distance to the other data points in the cluster

  • Other option: Choose a new data point as the new cluster center, and check if that new cluster center would result in a better clustering (slower, but more stable)

27 / 50

Lloyd's algorithm vs. k-Medoids Clustering

28 / 50

Lloyd's algorithm vs. k-Medoids Clustering

29 / 50

Lloyd's algorithm vs. k-Medoids Clustering

30 / 50

We forgot something ...

  • We need initial cluster centers from somewhere?

  • Simplest approach: Just pick data points at random; Problem: Results may be poor/unpredictable

  • Another idea: Pick a data point at random, then pick the data point that is furthest away from it, then pick the data point furthest away from both, etc.; Problem: Outliers affect the initialization

  • Another idea: Pick a data point at random, and assign weights to each other data point based on the distance. Then pick the next center using these weights as probabilities, etc.

  • You can also use the result from any other algorithm/guess/heuristic as an initialization, Lloyd's algorithm will never make the solution worse (as measured by the k-means clustering goal)!

31 / 50

Ward's Algorithm

  • Start with each data point in its own cluster

  • Merge two clusters until there are only k clusters left

  • Which two clusters do you merge? The two such that the average distance from the cluster centers increases the least

  • This is basically "greedy" k-means

32 / 50

Distribution-Based Clustering

  • Our representation of clusters as single vectors had the advantage of being simple

  • However, clusters sometimes have different sizes/distributions

  • So let's assume our clusters are probability distributions

  • Let's start with Gaussians

33 / 50

Gaussian Clusters

image/svg+xml 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
34 / 50

Gaussian Clusters

  • Each cluster has a mean (point) and covariance (matrix)

  • The mean defines where the center of the cluster is

  • The covariance matrix defines the size and extent

  • The mean and covariance are the parameters of the distribution

  • Technically, all Gauss distributions extend infinitely; we assign each data point to the cluster for which it has the highest probability (but we could allow membership in multiple clusters!); in other words, each Gaussian contributes to each data point with some (non-zero) probability

35 / 50

Expectation Maximization (EM)

  • Similar to k-means, we can determine parameter values for k Gaussians iteratively

  • Initialize k means and covariance matrices, then iterate:

    • (Expectation Step) Calculate the current responsibilities/contributions for each data point from each Gaussian

    • (Maximization Step) Use these responsibilities to calculate new means (weighted average of all data points), and covariance matrix

  • Repeat until the clusters don't change anymore

36 / 50

Expectation Maximization

37 / 50

Expectation Maximization

  • The general mathematical formulation of EM is actually more powerful

  • It works for general, parameterized models with latent (inferred) variables

  • The Expectation step computes the probabilities for these latent variables (which we called the "contribution" of a Gaussian to a data point)

  • The Maximization step finds new parameters using these probabilities (our parameters were the mean and covariance) that maximizes the likelihood of the data points

38 / 50

Density-Based Clustering

How do we cluster this?

No matter where we put our cluster centers, we can't cluster it into the inner and outer ring.

39 / 50

Density-Based Clustering

  • We can observe that clusters generally are "more dense" than the regions in between

  • Let's start with each data point in its own cluster

  • Single Linkage: We connect two clusters if the distance between any two points in them is minimal between all cluster pairs

  • Repeat until we have k clusters

  • Sometimes there are a few single points that would link two clusters, resulting in undesirable connections

  • Robust Linkage: Connect two clusters only if there are t points in each close to the other cluster

40 / 50

How many clusters?

  • So far we have kind of ignored how many clusters there are, but how do we get k?

  • Define "some measure" of cluster quality, and then try k=1,2,3,4,

    • Statistical: Variance explained by clusters
    • Measurements of cluster density, span, etc.
    • Usefulness in application (!)
    • etc.
  • There are also some more advanced algorithms that don't need to be told k explicitly (e.g. DBSCAN)

41 / 50

Frequent Pattern Mining

42 / 50

Frequent Pattern Mining

  • Let's say we collect information from many sources (e.g. people)

  • Now we want to see what is "common"

  • Which topics do many people like on social media, which actions are often performed in sequence, which cards are often played together in Hearthstone, games played together etc.

  • Applications: Group similar people/decks, find imbalance in game play, give recommendations, ...

43 / 50

Apriori Algorithm

  • Let's say we are given a set of sets, like a set of people, and each person has a set of games they play

  • We want to know which games are commonly played together

  • Define the support for a game as the number of people that play that game

  • Define a support threshold for frequent games

44 / 50

Apriori Algorithm

  • Identify all games for which the support is above the support threshold

  • Merge all such games into pairs

  • Discard all pairs for which the shared support is below the threshold

  • Continue merging until all item sets are below the threshold

  • Return subsets before last merge

45 / 50

Apriori For Example

  • A plays Call of Duty, Quake, Overwatch, Super Mario
  • B plays Call of Duty, Quake, Super Mario, Tetris
  • C plays Tetris, Quake, Super Mario
  • D plays Quake, Overwatch, Tetris

Support threshold: 3

  • Quake (4), Tetris(3), Super Mario (3) are all kept, Call of Duty (2), Overwatch(2) are discarded
  • {Quake,Tetris} (3), {Quake, Super Mario}(3) are kept, {Tetris, Super Mario}(2) is discarded
  • {Quake,Tetris,Super Mario}(2) is discarded
  • Return {Quake,Tetris} and {Quake, Super Mario}
46 / 50

What can we use that for?

  • Analytics: Look at why people prefer Quake over Call of Duty

  • Game Balancing: If everyone uses the same 3 spells or buys the same 3 items, determine if they are too strong or the others too weak

  • Recommendations: Say someone plays Tetris and Super Mario. Both show up in both of our frequent game subsets, so we should recommend Quake to them

47 / 50

Sequence Mining

  • For some data, such as actions, just the presence may not be as important, as the actual ordering/sequencing

  • We can modify the Apriori algorithm into the Generalized Sequential Patterns algorithm by considering sequences instead of sets

    • Start with all common sequences of length 1
    • Merge sequences by concatenating them, and count occurrences in the data
    • Continue adding 1-sequences until all sequences are below the threshold
48 / 50

References

50 / 50

Unsupervised Learning

2 / 50
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow