Note: The best way to read these notes is to look at the slides concurrently. I did not bother to copy the examples given in the slides
The book dismisses heuristic and logic based planners, but current research shows that they can be effective. On slides, HSP is a heuristic based planner, BlackBox is a logic-based planner, and as can be seen they both do well in the AIPS-98 Planning Competition.
Instead of Parial Order Planning, the problem can be solved as a standard A* search with a heuristic. Recall that in order for A* to guarantee an optimal solution, the heuristic must be admissible. An admissible heuristic is one that never overestimates the cost to reach the goal.
Given the problem in STRIPS notation as before, how can we come up with an admissible heuristic?
One common way to develop such a heuristic is to simplify the problem, and solve that simplified problem to get an estimate of how long it will take to solve the actual problem given the current state. Note that this is a similar concept (but not exactly the same) as the "problem relaxation" method we learned about in previous lectures on A*.
So the estimated cost of going from state s to state s' can be done in many ways. One possible method is to SUM the cost of achieving each predicate in s' and use that total as the heuristic. The problem with this is that it is not admissible (due to possible interactions occurring when achieving these predicates). As a result, the MAXIMUM of achieving each predicate by itself is taken. We get the following:
Now all that remains is to figure out how to find the cost of achieving a predicate P. This can be done using the following formula:
Notice that this equation is recursive, and so at the end you get a system of recursive equations, and solve them somehow (the exact method of doing this was not covered).
Problems and Solutions
This method doesn't scale well to larger problems, so a few tricks have to be used. In both of these tricks, the admissibility of the heuristic (and hence guaranteed optimality of the solution) is sacrificed.
One can also use logic to solve this problem. As stated in previous lectures, using full first order logic and resolution
is too inefficient and broad for this problem. It deduces a whole bunch of unnecessary facts in the process. Also, it just comes up
with *a* plan, not necessary a good plan.
Simplify
Once again, to get around these issues, we use propositional logic, and simplify the problem by getting rid of structure by instantiating operators. In the end, we get a whole bunch of
variables like At(Home, S0), NOT At(SM, S0), Go(Home, SM, S0) etc.
Notice that unlike STRIPS notation, we must explicitly define what is NOT true as well. There is no assumption
that if At(Home, S0) is not there, it is not true. Notice also that
there is no distinction between predicates (e.g. At(Home, s0) and
operators (e.g. Go(Home, SM, S0)), they are all just variables. Each of
these variables have a truth value, some of which are defined by the user based on the
starting state and goal state, and the rest are "filled in" by the planner.
The way this works is first you give the planner the number of "time
steps" the solution should take. During each time step, the variables
have some truth values. During the first time step, the truth values of
the predicate variables are assigned according to the starting state.
Similarly, during the last time step, the predicate variables are defined according
to the desired end state. The rest are calculated by the planner
Of course, the planner cannot choose arbitrary truth values.
We must give it some constraints, such as if it assigns GO(Home, SM) to true in one time step,
then it should assign at(HOME) as true in the next one. There are a few
types of these constraints, see the slides for examples of many of them.
After giving it the constraints, and after all the variables are
conjuncted or disjuncted together properly (see example in slides), it
must find truth values to a large propositional sentence that meets the given constraints. This is
the SAT problem, which is NP Hard. One can use systematic or heuristic
search techniques to solve them, but one method that tends to work well is
random restart hill climbing.
First, assign random truth values to all variables. The "goodness" of the solution is measured by how many conjunctions are true in the whole sentence.
Successors are calculated from the current state by changing the truth
value of one of the variables, and the successor is chosen randomly from
the best ones. Every now and then you restart the whole process just to
avoid getting stuck at a local minimum.
Summary
Somehow we have arrived at a SAT problem, starting from a planning
problem. This demonstrates the use of various AI techniques including
logic, search, and hill climbing all to solve one problem. Here are the
steps we have taken:
If state s already contains predicate p, it has already been achieved.
Otherwise, we find the least-cost operator o that can be performed to
achieve it. The cost of performing an operator is 1 (for actually performing it) plus the
cost of achieving the preconditions of the operator. The latter value is exactly what the heuristic we are deriving calculates so we use it.
Planning as Satisfiability [Slides]