%---------------------------------------------------------------------------- NATURAL LANGUAGE PROCESSING (NLP) - LEXICON and SYNTAX %---------------------------------------------------------------------------- Task of Natural Language Understanding: Input: A natural language text Output: The ``meaning'' of the text What is the meaning of ``meaning''? %---------------------------------------------------------------------------- SYNTACTIC PROCESSING It uses a lexicon (a dictonary) and a syntax (a grammar) to generate a parse tree. %---------------------------------------------------------------------------- BOTTOM-UP PARSING Sentence: He ate a frog. Simple Grammar: R1: S ---> NP VP R2: NP ---> noun | pronoun R3: VP ---> verb [det] NP [] implies optional * implies recursion Lexicon: Word Syntactic Category he pronoun ate verb a det frog noun Start with first word: He Lexical analysis shows: he is a pronoun Look at right hand side of grammar rules for match R2 has pronoun, so he is a NP Look at right hand side of grammar rules for match R1 has NP followed by VP, so now we need to find a VP. Go to next word: ate Lexical analysis shows: ate is a verb Look at right hand side of grammar rules for match R3 has verb followed by [det] NP, so now we need to find [det] NP Go to next word: a Lexical analysis shows: a is a det, so now we need to find NP Go to next word: frog Lexical analysis shows: frog is a noun Look at right hand side of grammar rules for match R2 has noun, and sentence has ended. So we get the parse tree S NP VP pronoun verb [det] NP | | | noun he ate a frog %---------------------------------------------------------------------------- TOP-DOWN PARSING Sentence: He ate a frog. Simple Grammar: R1: S ---> NP VP R2: NP ---> noun | pronoun R3: VP ---> verb [det] NP Lexicon: Word Syntactic Category he pronoun ate verb a det frog noun Start with first word: He Look at left hand side of grammar rules R1 shows S ---> NP VP, so He could be a NP R2 shows NP ---> noun | pronoun, so he could be a noun or pronoun Lexical analysis shows: he is a pronoun So, he is a NP; now we need to find the VP Go to next word: ate Look at right hand side of grammar rules R3 shows VP ---> verb [det] NP, so ate could be verb Lexical analysis shows: ate is a verb; now we need to find [det] noun Go to next word: a Lexical analysis shows: a is a det, so now we need to find NP Go to next word: frog Lexical analysis shows: frog is a noun We have found the syntactic categories we needed and sentence has ended. So we get the parse tree S NP VP pronoun verb [det] NP | | | noun he ate a frog %---------------------------------------------------------------------------- BOTTOM-UP vs. TOP-DOWN: In general it is hard to decide between them (similar to bottom-up vs. top-down search, backward vs. forward chaining) If the lexical dictionary has only one syntactic category for each word, then bottom-up parsing can be quite efficient; in general, however, the lexical dictionaries are more complex. If the grammar rules have only a terminal on the right hand side (or a terminal followed by a nonterminal), the top-down parsing can be quite efficient; in general, however, grammars are more complex. %---------------------------------------------------------------------------- TRANSITION NETWORKS A collection of nodes (states) connected by labelled arcs (transitions). Certain states are designated as "terminal states". Transitions represent syntactic entities; we change states when we recognize the entity. Example: det noun verb ------ det noun ------ S1 --> S2 ---> S3 ---> | S4 | --> S7 ---> | S8 | / \ ------ / \ ------ ---> ^ \ ---> adj noun/ prep\ adj / v S6 <----- S5 / \ det ---> adj S4 and S8 are the final states. S1, S2, and S3 are start states. This transition network represents a finite state machine that parses/recognizes the grammar given below. S -> NP VP | VP NP -> [det] [adj]* noun VP -> verb [PP]* [NP] PP -> prep NP Example: The naughty monkey ate the ripe bananas %---------------------------------------------------------------------------- Augmented transition networks (ATNs): TNs can't do much apart from recognize the grammar. In ATNs, arcs can have procedures attached to them. The procedures are executed whenever the arc is traversed. ATN features: Arcs are labelled and have arbitrary tests attached. Actions are attached to arcs, and are executed when the arc is traversed. Registers store information (e.g., noun, verb, aux, det). ATNs can back track if a wrong arc is selected initially. %---------------------------------------------------------------------------- ATN Example: Sentence: The brown dog has gone ATN: ------> q4/F ------> q5/F NP / verb ^ NP | | ------> q1 | verb ---- / \ aux | PP S ------> q3 \ / NP ------> q2 aux adj PP ---- ---- | | | | ------> q6 ------> q7/F / det noun NP \pronoun ------> q8/F PP ------> q9 ------> q10/F prep NP The states with a /F are possible final states Lexicon: Word Syntactic Category has verb, aux the det dog noun brown adjective gone verb ... ... Registers: NOUN: VERB: ADJECTIVE: AUXILIARY: DETREMINER: ... 1. Start in State S; Start with first work: The 2. Select the NP arc (arbitratily); Push to NP 3. Select the det arc (arbitrarily); check lexicon so see if the is a det 4. Test succeeds; set determiner register to definite (the), go to q6 5. Go to next word: brown; check lexicon so see if brown is an adjective 6. Test succeeds; set adjective register to brown; stay in q6 7. Go to next word: dog; check lexicon so see if dog is an adjective 8. Test fails; check lexicon so see if dog is a noun 9. Test succeeds; set noun register to dog; go to q7; Push to PP 10. Go to next word: has; check lexion to see if has is a preposition; 11. Test fails; pop to q7; since sentence is not yet over, Pop to q1 12. Select verb arc (arbitrarily); check lexicon so see if has is a verb 13. Test succeeds; set verb register to has; go to q4; 14. Go to next word: gone; Push to NP 15. Select arc det (arbitrarily); check lexicon so see if gone is a det 16. Test fails; select arc pronoun; check lexicon so see if gone is a pronoun 17. Test fails; Pop to q4. 18. Backtrack to q1; set verb register to Nil; go to previous word: has 19. Select arc aux; check lexicon so see if has is an aux 20. Test succeeds; set register auxiliary to has, go to q3 21. Go to next word: gone: check lexicon so see if gone is a verb 22. Test succeeds; set register verb to gone; go to q4 23. Since q4 is a final state and the sentence is over: stop The content of the registers represent the parse. %---------------------------------------------------------------------------- Copyright (c) Ashok Goel, 1997 Associate Professor, College of Computing Georgia Institute of Tecnology, Atlanta, Gergia 30332 Email: goel@cc.gatech.edu %----------------------------------------------------------------------------