Induction, Prediction, and Decision Trees
CS6660 Notes
Monday, 12-Nov-2001

Induction
    - If S is true for a sample, it is true for the whole population.
    - Analagous to generalization and curve fitting.
    - Truth-augmenting, not truth-perserving; results are hypothetical
      (contrast with deduction, which is truth preserving).
    - Useful for making predictions given partial knowledges/observations
      of the world.
    - [Diagram] Generalizing S for a population.
    - [Diagram] Curve fitting.

Prediction
    - If S is true for a population, try to predict outcome for
      instances inside and outside of that population. (Analagous to
      interpolation and extrapolation, respectively.)
    - [Diagram] Predicting in and outside a population.
    - [Diagram] Interpolating/extrapolating points.

Digressions
    - Extrapolative prediction harder than interpolative prediction.
    - A single sample point is not helpful for either prediction or
      induction.
    - Induction complicated by the fact that several hypothesis may
      explain the data. In the curve-fitting example, several curves
      may be drawn to fit the data.

Batch Induction and Decision Trees
("Off-line Learning")
    - Induce a decision tree from a set of examples.
    - Restaurant Example
          - Feature vector:
	      - Is there an alternative place to eat nearby?
	      - Is there a bar to wait at?
	      - Is it a Friday?
	      - Am I hungry?
	      - Is the place full, partially full, or empty? (_patrons_)
	      - What is the price?
	      - Is it raining?
	      - Do I have reservations?
	      - What type of cuisine?
	      - What is the estimated wait time?
	  - Outcome vector:
	      - Did I wait or go?
          - We desire to build a decision tree given 12 historical
	    examples. (See book figure 18.5, page 534.)
	  - Consider _type_ as the first discriminator in the tree:
	        - We will learn nothing after considering _type_: any input
		  still has 50/50 chance of being accepted.
                - [Diagram] Decision tree with _type_. 
          - Consider _patrons_ as the first discriminator:
	        - Good choice: no further analysis necessary if
                  _patrons_ is "some" or "none". Half of all instances
                  will be decided with a single test.
                - [Diagram] Decision tree with _patrons_
          - Augment _patrons_ with _hungry_:
                - Helps us further distinguish the "full" case. We can
                  expand in this manner until all instances are
                  classified.
                - [Diagram] Decision tree with _patrons_ and _hungry_
    - The order in which we consider features (as demoed above
      with _type_ and _patron_) dramatically affects the space
      and time size of our decision tree. In this sense, compact trees
      represent better hypotheses.
    - Digressions:
          - What if it is impossible to discriminate all instances?
                - There may be a feature missing.
                - Predictions will sometimes be incorrect.
	  - Are trees built in real time?
	        - They can be: see incremental induction, below.
	  - What makes a good feature to choose for induction?
                - Whichever one does the most seperating.

Incremental Induction
    - Examples provided one at a time.
    - Agent builds hypothesis and modifys it with the introduction of
      new/conflicting information.
    - Restaurant Example:
          - Given x1, assume:
	      hungry -> yes   #arbitrary
	  - Given x2, we notice contradiction, so we specialize:
              hungry && $$$ -> yes
          - Given x3, we notice hypothesis is too narrow, so we generalize:
	      (hungry && $$$) || (!hungry && $) -> yes
          - etc.
    - Generalize when encountering a mispredicted positive.
    - Specialize when encountering a mispredicted negative.
    - Simple method of changing hypothesis:
          - Generalization: add disjunctions
	  - Specialization: add conjunctions
    - Agents with the same learning methods can come to different conclusions.
    - Result of learning depends on:
          - Order of examples
          - Focus of attention (how agents choose to prioritize features)
    - [Diagram] Expansion and contraction of hypothesis.

Stepping Back (a quick, unexplored big-picture view of inductive learning)
    - Hypothesis are refined with experience...
    - [Diagram] Learner/Performer/Teacher Diagram