Induction, Prediction, and Decision Trees
CS6660 Notes
Monday, 12-Nov-2001
Induction
- If S is true for a sample, it is true for the whole population.
- Analagous to generalization and curve fitting.
- Truth-augmenting, not truth-perserving; results are hypothetical
(contrast with deduction, which is truth preserving).
- Useful for making predictions given partial knowledges/observations
of the world.
- [Diagram] Generalizing S for a population.
- [Diagram] Curve fitting.
Prediction
- If S is true for a population, try to predict outcome for
instances inside and outside of that population. (Analagous to
interpolation and extrapolation, respectively.)
- [Diagram] Predicting in and outside a population.
- [Diagram] Interpolating/extrapolating points.
Digressions
- Extrapolative prediction harder than interpolative prediction.
- A single sample point is not helpful for either prediction or
induction.
- Induction complicated by the fact that several hypothesis may
explain the data. In the curve-fitting example, several curves
may be drawn to fit the data.
Batch Induction and Decision Trees
("Off-line Learning")
- Induce a decision tree from a set of examples.
- Restaurant Example
- Feature vector:
- Is there an alternative place to eat nearby?
- Is there a bar to wait at?
- Is it a Friday?
- Am I hungry?
- Is the place full, partially full, or empty? (_patrons_)
- What is the price?
- Is it raining?
- Do I have reservations?
- What type of cuisine?
- What is the estimated wait time?
- Outcome vector:
- Did I wait or go?
- We desire to build a decision tree given 12 historical
examples. (See book figure 18.5, page 534.)
- Consider _type_ as the first discriminator in the tree:
- We will learn nothing after considering _type_: any input
still has 50/50 chance of being accepted.
- [Diagram] Decision tree with _type_.
- Consider _patrons_ as the first discriminator:
- Good choice: no further analysis necessary if
_patrons_ is "some" or "none". Half of all instances
will be decided with a single test.
- [Diagram] Decision tree with _patrons_
- Augment _patrons_ with _hungry_:
- Helps us further distinguish the "full" case. We can
expand in this manner until all instances are
classified.
- [Diagram] Decision tree with _patrons_ and _hungry_
- The order in which we consider features (as demoed above
with _type_ and _patron_) dramatically affects the space
and time size of our decision tree. In this sense, compact trees
represent better hypotheses.
- Digressions:
- What if it is impossible to discriminate all instances?
- There may be a feature missing.
- Predictions will sometimes be incorrect.
- Are trees built in real time?
- They can be: see incremental induction, below.
- What makes a good feature to choose for induction?
- Whichever one does the most seperating.
Incremental Induction
- Examples provided one at a time.
- Agent builds hypothesis and modifys it with the introduction of
new/conflicting information.
- Restaurant Example:
- Given x1, assume:
hungry -> yes #arbitrary
- Given x2, we notice contradiction, so we specialize:
hungry && $$$ -> yes
- Given x3, we notice hypothesis is too narrow, so we generalize:
(hungry && $$$) || (!hungry && $) -> yes
- etc.
- Generalize when encountering a mispredicted positive.
- Specialize when encountering a mispredicted negative.
- Simple method of changing hypothesis:
- Generalization: add disjunctions
- Specialization: add conjunctions
- Agents with the same learning methods can come to different conclusions.
- Result of learning depends on:
- Order of examples
- Focus of attention (how agents choose to prioritize features)
- [Diagram] Expansion and contraction of hypothesis.
Stepping Back (a quick, unexplored big-picture view of inductive learning)
- Hypothesis are refined with experience...
- [Diagram] Learner/Performer/Teacher Diagram