%---------------------------------------------------------------------------- Learning: The ability to change through experience to better one's behavior. 1. Acquisition of new knowledge that enhances reasoning capabilities by allowing it to do something it couldn't do before ("knowledge level learning"). 2. Reorganization of existing knowledge (e.g., compilation) that, although it doesn't enhance the reasoning capabilities in the above sense, makes the system able to do its task better or more efficiently ("speedup learning"). 3. Acquisition or modification of reasoning methods ("meta-learning"). %---------------------------------------------------------------------------- Learning task: A learning task can be characterized by the (input) SOURCE and the (output) TARGET structures. Possible target structures include whatever would aid a reasoner to do its job (better). For example, - a new association, a modified association - a new plan, a modified plan - a new criterion for recognizing an object - a new word definition - a new value for a slot in a schema, a new slot, a new schema - a new case, a modified case - a new index - a new model, a modified model --- --- --- %---------------------------------------------------------------------------- Basic phases in learning: 1. Tracking down what needs to be learned (credit and blame assignment). 2. Proposing the new knowledge using some learning algorithm or method. 3. Debugging the proposed knowledge and integrating it with the old knowledge. The "credit (or blame) assignment" problem: Identifying the faulty knowledge structures when an error is made. The "deciding what to learn" problem: Identifying what needs to be learned ("formulating learning goals"). The "deciding how to learn" problem: Identifying appropriate learning strategies (in a multistrategy learning system). Most current systems use only one learning strategy. The "indexing" problem: Indexing the new knowledge back into memory such that it is retrieved in appropriate situations in the figure. %---------------------------------------------------------------------------- Issues: 1. How much do you have to know already before you can learn a little more? 2. How does the system assign responsibility for successes and failures? (This is called "credit/blame assignment".) 3. Should the system learn new knowledge that it didn't know, or should it learn to use existing knowledge more efficiently and effectively? (This is called "knowledge level learning" vs. "speedup (or symbol level) learning".) Are the methods different? 4. Should the system learn from its failures, or from its successes? Are the methods different? 5. Should the system learn new knowledge from scratch, or should it learn by incrementally refining its existing knowledge? Are the methods different? 6. Which examples should we present to the learning system, and in which order? 7. What happens if the input is noisy or incorrect? 8. Should the system try to learn "correct" knowledge only? 9. Can the system get overloaded with too much knowledge? (This is called the "utility problem".) 10. Can the system decide for itself what it should learn? Can it formulate its own learning goals? (This is called the "deciding what to learn" problem. Systems that actively pursue learning goals are called "active learning" systems.) 10. How can the system figure out which features of the input are relevant? (This is called the "situation identification" problem or the "feature selection" problem.) 11. Can a system create new representational constructs on its own? (This is called the "new term" or the "constructive induction" problem.) 12. How do people learn, anyway? 13. Could a system use multiple different learning methods? (This is called "multistrategy learning".) %---------------------------------------------------------------------------- %---------------------------------------------------------------------------- Inductive learning: Learning from examples Similarity-based learning, Inductive learning, Induction, Concept learning, Category formation, Learning class descriptions from examples. Given: Observational statements (facts) A tentative inductive assertion Background knowledge Find: An inductive assertion (hypothesis) that satisfies the background knowledge and tautologically implies the observational statements. The set of facts generalize to the inductive hypothesis. The inductive hypothesis should be: - consistent - refutable - simple, elegant Two classes of inductive problems: Learning concepts from examples (successive refinement based on a series of examples presented one at a time) - Building characteristic descriptions (building long and complete descriptions that describe the concepts being learned) - Building discriminant descriptions (building short discriminatory descriptions that help discriminate concepts from each other) - Discovering sequence generating rules Descriptive generalization (generalization of sets of observations presented all at once rather than as single examples) - Theory formation - Pattern discrovery - Building conceptual hierarchies Learning may be viewed as a heuristic state space search: States = symbolic descriptions of concepts Operators = inference rules (rules for generalization, specialization and reformulation) Goal state = inductive assertion satisfying desired criteria Issues in choosing instances: Noise in data. Order of presentation. Represent more than one concept? %---------------------------------------------------------------------------- Winston's learning program (Winston 1972) Task: To derive descriptions of complex objects (e.g., house, arch) based on a series of examples presented by a teacher. Main features: - Learns a "target concept" (the concept that the teacher wants it to learn) by induction. - Proposes and then refines new knowledge (a series of "models" of the target concept). - Attached to a vision program that recognizes primitive objects (e.g., blocks, wedges, spheres). - Teacher presents a series of examples that can be: - SAMPLES: An instance of the target concept. - NEAR MISSES: An example that is not an instance of the target concept, but is "nearly so"; it differs only in one "feature" or "aspect". Example: /\ Sample: (BRICK A) / \ (WEDGE B) / B \ (SUPPORTS A B) ------ ------ Model: (BRICK ?x) | A | (WEDGE ?y) | | (SUPPORTS ?x ?y) ------ /\ ------ Near miss: (BRICK AA) / \ | | (WEDGE BB) / BB \ | AA | ------ ------ Refined model: (BRICK ?x) (WEDGE ?y) (MUST (SUPPORTS ?x ?y)) ------ Sample: (BRICK AAA) | | (BRICK BBB) | BBB| (SUPPORTS AAA BBB) ------ ------ Refined model: (BRICK ?x) | AAA| (PRISM ?y) | | (MUST (SUPPORTS ?x ?y)) ------ /\ Near miss: (WEDGE AAAA) / \ (WEDGE BBBB) /BBBB\ (SUPPORTS AAAA BBBB) ------ /\ Refined model: (MUST-BE-A (BRICK ?x)) / \ (PRISM ?y) /AAAA\ (MUST (SUPPORTS ?x ?y)) ------ Algorithm: - Start with initial model. Now, for each subsequent example: - Match example to model to find differences. - If example is a near miss, then specialize the model. Specialization heuristics: - REQUIRE-LINK: If the evolving model has a link in a place where a near miss does not, convert the model's link to a MUST form. - FORBID-LINK: If the near miss has a link in a place where the evolving model does not, add a MUST-NOT link to the model. - If example is a sample, then generalize the model. Generalization heuristics: - CLIMB-TREE: If something in the evolving model corresponds to something different in the sample, replace the model's clause with a MUST-BE-A link pointing to the most specific shared class connecting the things in the model and sample. - ENLARGE-SET: If something in the evolving model corresponds to a different thing in the sample, replace the model's clause with a MUST-BE-A link pointing to a new class composed of the union of the classes of the two things. - DROP-LINK: If the things that are different in the model and the sample form an exhaustive set, or when the model has a link that is not in the sample, delete the link from the model. - CLOSE-INTERVAL: When the number or interval in the model corresponds to a number in a sample, replace the number (or interval) by an interval spanning the model's number and the sample's number (or an enlarged interval that includes the sample's number, respectively). Underlying principles: NO-GUESSING PRINCIPLE: when in doubt about what to learn, learn nothing. FELICITY CONDITIONS: assume teacher presents reasonable examples. NO-ALTERING PRINCIPLE: if an example does not match the model, create a special case exception. MARTIN'S LAW: You can't learn anything until you almost know it already. Note: - The "scene" (example) must always be less general than the model. - The scene will always have particular objects in it, whereas the model may refer to abstract classes of objects. Contributions: - Showed incremental learning (learning in small steps). - Pointed out the importance of training sequences. - Pointed out the importance of good, consistent representations, without which the difference matching could not be done. Drawbacks: - Learning method requires one-to-one correspondence. - Requires that examples be presented in the correct order. - Disregards the fact that in the real world there are usually many features of objects and scenes. A FOCUS MECHANISM is necessary to distinguish features that are SALIENT. - Negative examples must be "near misses" (differ from the current model in only one way). %---------------------------------------------------------------------------- %---------------------------------------------------------------------------- Learning by experimentation and problem-solving Mitchell's version spaces/Candidate elimination algorithm: Generalize the ideas in Winston's arch program Least commitment - version space pruned as little as possible Exhaustive breadth-first search throgh the version space (Winston is depth-first) Example: LEX (Mitchell 1982) acquires and refines problem-solving heuristics in the domain of symbolic integration. A form of learning from examples, but less directed than Winston's program. Goal: To devise methods by which heuristic problem-solvers improve their expertise through experience, by: - generating selected problems in the domain - solving them using available heuristics - analyzing their solutions to learn better heuristics Given: - Initial state involving integral - Goal state without integrals - Operators and preconditions (about 40 operators) - Initially, LEX has no heuristics for applicability of operators, but it has a language for representing heuristics. Heuristics take the following form: IF current problem state S matches applicability condition P THEN apply operator O with variable binding B. Language for describing applicability conditions are based on a grammar for algebraic expressions. This grammar is fixed, which restricts the range of heuristics that can be learned. Heuristics for operators represent subclasses of problem states to which the operator SHOULD be applied. Compare these with preconditions which represent when the operator CAN be applied. Example: /-\ | \ x transc(x) dx --> OP2 (integration by parts) \ with u = x | and dv = transc(x) dx \-/ Modules in LEX: - PROBLEM SOLVER - Outputs a detailed trace of the search performed - CRITIC - Analyzes the trace and outputs a set of positive and negative training instances - POSITIVE INSTANCES: Desirable search steps - NEGATIVE INSTANCES: Undesirable search steps - GENERALIZER - Proposes and refines heuristics to improve performance - PROBLEM GENERATOR - Attempts to generate practice problems for the problem solver to work on that will be informative, i.e., that will lead to proposing or refining heuristics, but yet are solvable using current heuristics PROBLEM GENERATOR / ^ practice / \ partially learned problem / \ heuristics v \ PROBLEM SOLVER GENERALIZER \ ^ problem \ / training solving \ / instances trace v / CRITIC Example: Problem generator proposes int(3x cos(x) dx) as an interesting problem to solve. (This was done by hand in the actual implementation of LEX.) Problem solver solves this problem using search. Existing heuristics are used to constrain the search. int(3x cos(x) dx) / \ / \ OP2 with u=3x, dv=cos(x)dx / \ ... 3x sin(x) - int(3 sin(x) dx) / \ OP1 / \ / \ 3x sin(x) - 3 int(sin(x) dx) ... | | OP5 | 3x sin(x) - +3 cos(x) + C Problem solving trace (the above search tree) is passed on to the critic. Critic notices which decisions led to the correct result and which ones didn't pan out, and suggests possible training instances to try out. For example: IF int(3x cos(x) dx) THEN apply OP2 with u = 3x, dv = cos(x) dx Generalizer creates a proposed heuristic or refines an old one. Version spaces: Tentative heuristics are represented using VERSION SPACES (most general and most specific forms). For example: S: IF int(3x cos(x) dx) THEN apply OP2 with u = 3x G: IF int(f1(x) f2(x) dx) THEN apply OP2 with u = f1(x) These represent the most specific heuristic and the most general heuristic one might learn. The actual heuristic it should learn is probably in between somewhere (it lies in the "version space"). Refinement occurs by generalizing S or specializing G, until the heuristic converges (hopefully) to one that works well. Thus the VERSION SPACE represents all the alternative PLAUSIBLE DESCRIPTIONS of the heuristic. A plausible description is one that is applicable to all known positive examples and no known negative example. Algorithm: CANDIDATE ELIMINATION Given: - A representation language - A set of positive and negative examples expressed in that language Compute: A concept description that is consistent with all the positive examples and none of the negative examples. Method: - Initialize G to contain one element: the null description (all features are variables). - Initialize S to contain one element: the first positive example. - Accept a new training example. - If it is a positive example: - Remove from G any descriptions that do not cover the example. - Update the S set to contain the most specific set of descriptions in the version space that cover the example and the current elements of the S set (i.e., generalize the elements of S as little as possible so that they cover the new training example). - If it is a negative example: - Remove from S any descriptions that cover the example. - Update the G set to contain the most general set of descriptions in the version space that do not cover the example (i.e., specialize the elements of G as little as possible so that the negative example is no longer covered by any of the elements of G). - If S and G are both singleton sets, then: - if they are identical, output their value and halt. - if they are different, the training cases were inconsistent. Output this result and halt. - else continue accepting new training examples. Still a trial and error method. The program does not base its choice of examples, or its learned heuristics, on an ANALYSIS of what works or why it works, but rather on the simple assumption that what works will probably work again. Evaluation of LEX: 12 training problems 5 test problems tried after every second training problem 14 heuristics formed, covering 13 of its 32 operators 12 heuristics remained partially learned Performance (number of search steps) improved two orders of magnitude Exercise: Try this algorithm on Winston's house example, above. Example (from Rich & Knight): Learning the concept of "Japanese economy car" Features: Origin, Manufacturer, Color, Decade, Type POSITIVE EXAMPLE: (Japan, Honda, Blue, 1980, Economy) Initialize G to singleton set that includes everything Initialize S to singleton set that includes first positive example G = {(?, ?, ?, ?, ?)} S = {(Japan, Honda, Blue, 1980, Economy)} NEGATIVE EXAMPLE: (Japan, Toyota, Green, 1970, Sports) Specialize G to exclude negative example G = {(?, Honda, ?, ?, ?), (?, ?, Blue, ?, ?) (?, ?, ?, 1980, ?) (?, ?, ?, ?, Economy)} S = {(Japan, Honda, Blue, 1980, Economy)} POSITIVE EXAMPLE: (Japan, Toyota, Blue, 1990, Economy) Prune G to exclude descriptions inconsistent with positive example Generalize S to include positive example G = {(?, ?, Blue, ?, ?) (?, ?, ?, ?, Economy)} S = {(Japan, ?, Blue, ?, Economy)} NEGATIVE EXAMPLE: (USA, Chrysler, Red, 1980, Economy) Specialize G to exclude negative example (but staying within version space, i.e., staying consistent with S) G = {(Japan, ?, Blue, ?, ?) (Japan, ?, ?, ?, Economy)} S = {(Japan, ?, Blue, ?, Economy)} POSITIVE EXAMPLE: (Japan, Honda, White, 1980, Economy) Prune G to exclude descriptions inconsistent with positive example Generalize S to include positive example G = {(Japan, ?, ?, ?, Economy)} S = {(Japan, ?, ?, ?, Economy)} S = G, both singleton => done! Extensions: Continuously valued features Hierarchical knowledge Limitations: Exhaustive breadth-first search Inconsistent data (noise) may cause target concept to be pruned Solution: multiple version spaces (costly) Learning disjunctive concepts is hard %----------------------------------------------------------------------------