CS 8803B - Artificial Intelligence Lecture on 09/27/2002 Written by Zsolt Kira

Note: The best way to read these notes is to look at the slides concurrently. I did not bother to copy the examples given in the slides

Search-based Planning [Slides]

The book dismisses heuristic and logic based planners, but current research shows that they can be effective. On slides, HSP is a heuristic based planner, BlackBox is a logic-based planner, and as can be seen they both do well in the AIPS-98 Planning Competition.

Instead of Parial Order Planning, the problem can be solved as a standard A* search with a heuristic. Recall that in order for A* to guarantee an optimal solution, the heuristic must be admissible. An admissible heuristic is one that never overestimates the cost to reach the goal.

Given the problem in STRIPS notation as before, how can we come up with an admissible heuristic?

Coming up with a Good Heuristic

So the estimated cost of going from state s to state s' can be done in many ways. One possible method is to SUM the cost of achieving each predicate in s' and use that total as the heuristic. The problem with this is that it is not admissible (due to possible interactions occurring when achieving these predicates). As a result, the MAXIMUM of achieving each predicate by itself is taken. We get the following:

Now all that remains is to figure out how to find the cost of achieving a predicate P. This can be done using the following formula:


If state s already contains predicate p, it has already been achieved. Otherwise, we find the least-cost operator o that can be performed to achieve it. The cost of performing an operator is 1 (for actually performing it) plus the cost of achieving the preconditions of the operator. The latter value is exactly what the heuristic we are deriving calculates so we use it.

Notice that this equation is recursive, and so at the end you get a system of recursive equations, and solve them somehow (the exact method of doing this was not covered).

Problems and Solutions

This method doesn't scale well to larger problems, so a few tricks have to be used. In both of these tricks, the admissibility of the heuristic (and hence guaranteed optimality of the solution) is sacrificed.

Planning as Satisfiability [Slides]

One can also use logic to solve this problem. As stated in previous lectures, using full first order logic and resolution is too inefficient and broad for this problem. It deduces a whole bunch of unnecessary facts in the process. Also, it just comes up with *a* plan, not necessary a good plan.

Simplify

Once again, to get around these issues, we use propositional logic, and simplify the problem by getting rid of structure by instantiating operators. In the end, we get a whole bunch of variables like At(Home, S0), NOT At(SM, S0), Go(Home, SM, S0) etc. Notice that unlike STRIPS notation, we must explicitly define what is NOT true as well. There is no assumption that if At(Home, S0) is not there, it is not true. Notice also that there is no distinction between predicates (e.g. At(Home, s0) and operators (e.g. Go(Home, SM, S0)), they are all just variables. Each of these variables have a truth value, some of which are defined by the user based on the starting state and goal state, and the rest are "filled in" by the planner.

The way this works is first you give the planner the number of "time steps" the solution should take. During each time step, the variables have some truth values. During the first time step, the truth values of the predicate variables are assigned according to the starting state. Similarly, during the last time step, the predicate variables are defined according to the desired end state. The rest are calculated by the planner

Of course, the planner cannot choose arbitrary truth values. We must give it some constraints, such as if it assigns GO(Home, SM) to true in one time step, then it should assign at(HOME) as true in the next one. There are a few types of these constraints, see the slides for examples of many of them.

After giving it the constraints, and after all the variables are conjuncted or disjuncted together properly (see example in slides), it must find truth values to a large propositional sentence that meets the given constraints. This is the SAT problem, which is NP Hard. One can use systematic or heuristic search techniques to solve them, but one method that tends to work well is random restart hill climbing.

First, assign random truth values to all variables. The "goodness" of the solution is measured by how many conjunctions are true in the whole sentence. Successors are calculated from the current state by changing the truth value of one of the variables, and the successor is chosen randomly from the best ones. Every now and then you restart the whole process just to avoid getting stuck at a local minimum.

Summary

Somehow we have arrived at a SAT problem, starting from a planning problem. This demonstrates the use of various AI techniques including logic, search, and hill climbing all to solve one problem. Here are the steps we have taken: