A planning problem consists of three parts:
A definition of the current state of the world
A definition of a desired state of the world
A definition of the actions the agent can take
We can define a planning problem in PDDL domain and problem files:
The domain file defines which actions (operators) exist, and (optionally) the types of all objects
The problem file defines the initial state of the world (and any additional objects that may be required), and the goal condition
A planner then reads these two files and outputs a plan
Today we'll talk about that last part
Recall the pathfinding problem from the second lecture
We are given a graph, consisting of nodes and edges
In our implicit graph representation, we could ask one node to generate its neighbors
We would start at the start node, and generate neighbors "intelligently" until we reached a goal node
To use a pathfinding algorithm for planning, we need to formulate the planning problem as a graph. Let's start with the "obvious" choice:
Each state is a logical state
Each edge corresponds to an action
In any state, we can take any action whose preconditions are satisfied to generate a new state
When we start at the node corresponding to the start state and start expanding actions, we generate the so-called State-Space
We can then use a standard A* algorithm to do pathfinding
In fact, many planners do exactly that, with different heuristics
Parse PDDL files like discussed last week
Extract all types, objects, operators, the initial state and the goal condition
Ground actions
Plan
(:action up :parameters (?f1 - floor ?f2 - floor) :precondition (and (lift-at ?f1) (above ?f1 ?f2)) :effect (and (lift-at ?f2) (not (lift-at ?f1))))
For each parameter, replace with all possible values.
Save all ground actions in a list.
You may end up with many ground actions.
You can create the world for the initial state by passing all atoms and the type dictionary to make_world
. However, for pathfinding you will need a Node
subclass, e.g. PlanningState
.
Each PlanningState
contains a reference to a logical world, as well as to the list of all actions
Note: You don't have to use make_world
, and could instead store the state information directly in PlanningState
get_neighbors
checks which actions can be applied in the logical world, i.e. which actions' preconditions are satisfied using models
Generates a new logical world for each of these actions by applying its effects using apply
Returns Edge
objects containing new PlanningState
objects for each generated logical world, with a cost of 1
, and a string representing the action that was taken
We can now use astar
as our planning algorithm!
Pass it a PlanningState
corresponding to the initial state (which can generate more states by applying actions)
For the goal condition, write a isgoal
function that uses models
on the state and the goal condition
Call pathfinding.astar(start, h, isgoal)
1995: Graphplan (next lecture)
1997/1998: HSP (Heuristic Search Planner)
2000/2001: FastForward
2005/2006: FastDownward
We heard that heuristics can speed up A*
We used heuristics to tell the algorithm which nodes on the search frontier were "closer" to the goal
For search in Romania or on UCR campus we used the straight line distance as a "guess" of how close a node could be to the goal
We tried to have our heuristic underestimate the cost and be in the same "units" as the actual cost (e.g. km)
In planning, our nodes are (logical) states, and our edges are actions
When we expand a state by applying all actions, we compute a heuristic value for each of the successor states
This heuristic value should estimate how long a plan from that state to a goal state is (at most)
In other words, we want to estimate how "difficult" it is to satisfy the goal from each state in our search frontier
So how do we define a heuristic for such an abstract process?
Here is an idea: Solve a simplified/approximate (relaxed) version of the problem, and use the cost of the solution as the heuristic
How can we simplify the problem?
Idea: Ignore delete lists
Why is this simpler? Because the state always grows
With a larger state, more actions can be applied, and eventually we will have added all atoms that can possibly be added
Note: This does not work without modification for non-STRIPS domains that have negative preconditions.
Oops: This is still NP-hard (action ordering)...
Instead, use an estimation:
The heuristic value is then the sum of the heuristic values of all atoms in the goal
This is no longer admissible, since all goal atoms are treated as independent
Positive Preconditions: {free(),clear(X)} Negative Preconditions: None
Add List: {holds(X)}
Delete List: {free(),clear(X)}
Relaxed blocksworld Pickup:
{free(),clear(X)}⇒{holds(X)}
Put down:
{holds(X),clear(Y)}⇒{on(X,Y),clear(X),free()}
Current State:
{on(A,B),on(B,Table),on(C,Table),clear(A),clear(C),free()}
HSP was the first widely successful planner to use heuristic search on STRIPS problems
It won the 1998 planning competition
There are updated versions which work with similar strategies
FastForward was inspired by HSP
We said earlier that solving the relaxed problem is NP-Hard
But that depends on which relaxed problem we mean
Specificially, finding the optimal plan is NP-hard, but determining a plan is in P
Of course, this may overestimate
Start with a "layer" consisting of the set of all atoms in the current state
Apply all applicable relaxed actions, and add all their effects to the set of atoms generating the next layer
Continue until all atoms in the goal can be found in a layer
Then backtrack through the layers to build a relaxed plan
The length of this plan is the value of the heuristic
FastForward also uses a modified search procedure
Since the heuristic may overestimate, it can be beneficial to ignore it for parts of the search
For any state, if all neighbors have a higher estimated cost, FastForward will expand all of those states' neighbors, until it finds a state with a lower heuristic value
In other words, as long as the heuristic values don't decrease, FastForward uses breadth-first search
FastDownward is a "planning system", that implements several different search procedures and heuristics
Additionally, FastDownward compiles the planning problem to a different representation
Multi-Valued Planning Task: Instead of binary predicates we have value assignments
Predicates are often not a very natural way to represent real-world tasks
For example, it would be nice if we could easily say which block the gripper is holding in blocksworld, or which block is above another
Multi-Valued Planning Tasks (MPT) allow you to do just that
An MPT contains variables, each of which can be assigned one value from a finite domain
For example: "holds" may be a state variable with the domain ranging over all available blocks
Why?
MPTs are still PSPACE-hard (propositional planning is just a special case where all state variables have values "true" or "false")
However, MPTs allow the natural expression of mutually exclusive states, which may cut down the search space
For example, we know that on(A,B)
and on(A,C)
can never be true at the same time
A propositional planner first has to figure that out
An MPT planner knows that on_A
can only have the value B
or C
, but never both
Using the state variables, we can analyze how they can change
For example, on_A = B
may only change into on_A = undefined
We can also determine under which conditions such changes may occur (i.e. which actions cause them, and what preconditions they have)
This gives us some guidance in what to do
Of course, this analysis could have been done on the original problem, the MPT approach just reduces the space of possibilities
As mentioned, FastDownward actually implements several different approaches
The translation from a propositional planning problem (in PDDL) to an MPT is actually a separate component
The algorithms we are discussing for propositional planning usually still work on MPTs with some slight modifications
Relaxation
Abstraction
Critical Paths
Landmarks
Network Flow
So far all heuristics we have talked about have been relaxation heuristics
They relax the planning problem in some way
Removing negative effects is the most common way to do this
You could also remove other parts of actions, or the goal, or split up actions
"Estimate cost by projecting the state space to a smaller space (applying a graph homomorphism)" (Helmert and Röger)
Make the planning problem simpler, by removing options
For example, remove one of the blocks from actions and goals
For some domains you may also have a more abstract representation (e.g. for a strategy game: Have a library of high level strategic actions which can help solve the planning problem on low-level actions)
We can part of the goal, and construct a "critical path" backwards
For example, to get B
on the table, we have to put-on-table(B)
, for that we first have to pickup(B)
, etc.
The number of actions on this critical path can serve as a heuristic
We may want to try different parts of the goal to get a better estimate
In many cases, we can find actions that have to happen (Landmarks)
For example: If we want B
to be on the table, we have to have an action put-on-table(B)
Our heuristic value can be constructed from counting all of these landmark actions
The more accurate our count is, the more accurate is our heuristic
We can look at actions as producing and consuming facts
The number of productions and consumptions has to be the same
That means, for every time we pick up a block, we have to put one down as well
We may be able to use this fact to determine how many "switches" have to be made at least
Task 4 is the most "creative" part of the project
You are supposed to come up with your own heuristic
Think about the different types, and how you would implement them
The FastForward Heuristic may be a good starting point
Combining Multiple Heuristics
State-Space Pruning
Invariant Synthesis
Action preferences
Other search strategies
No single heuristic is good for all planning problems
Just calculate multiple different ones, and combine the results
Pick the highest, lowest, average, sum, etc.
Of course, this is just another heuristic and will not be "perfect" either
But it may overcome limitations of others
Many problems contain states that are "almost" the same
For example, if we want to transport packages around (logistics domain), and have multiple identical trucks, we can remove states that only differ by which truck is where
It may also be that the order of two actions does not matter, so we don't have to consider both orderings
Some of these calculations are relatively simple, others may require external knowledge
In most planning problems the operators maintain certain invariants, i.e. facts that hold in all reachable states
For example, in blocksworld there can only ever be one block that is held by the gripper
This forms the basis of the translation to an MPT, but can also be used when solving the relaxed problem to strengthen the heuristic
Another form of invariants are predicates that can never change, which we can use to exclude certain actions (like putting a block on itself)
When we calculate a heuristic by solving a relaxed problem, we actually get a "plan"
This plan can be used to give us a clue what a "good" next action might be
Rather than considering all actions the same, we prefer these actions
Can be used to augment the heuristic, or simply try this action first and if things don't work out fall back to "full" search
Remember: Everything in AI is either representation or search
Searching through state space is just one possible approach to planning
Other planners may search through a different kind of space (different representation)
Yet others may use a very different search procedure when expanding the states
We will look at two such approaches in the next two weeks
Homework 5 has been posted on the class website
There are 5 problems using the planning domains from the repository of http://planning.domains
You will be calculating heuristic values and solving relaxed problems
When you do this, take note of how the different heuristics perform their estimations
A planning problem consists of three parts:
A definition of the current state of the world
A definition of a desired state of the world
A definition of the actions the agent can take
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |