Phrase structure grammars The grammars we've been looking at so far are called "phrase structure grammars"---they emphasize the tree-like structuring of phrases and sentences. They also make an assumption about the locality of grammar rules in that each rule deals with a single node (this is where the context-freeness aspect comes into play). But this locality assumption leads to a big problem: we want our grammar, and the sentences generated by the grammar, to reflect the generality in the language, and this is hard to do with phrase structure grammars when we encounter discontinuous constituents. What's a discontinuous constituent? Here's an illuminating example---consider the sentence "A man wearing earrings walked by." Here's a parse tree for the sentence, constructed through the application of a phrase structure grammar: S / \ / \ / \ / \ NP VP /|\ / \ / | \ / \ / | \ / \ ART NOUN COMP VERB PP | | /\ | /\ / \ P NP a man / \ walked / \ / \ by [LOC] VERB NOUN | | wearing earrings This works out the way we'd expect, with "wearing earrings" modifying the NP "a man". But now look at a variation of this same sentence: "A man walked by wearing earrings." We surely know that "wearing earrings" tells us about "a man", not about how or where the man walked. In other words, this sentence means the same thing as the one before, yet our phrase structure grammar is going to prevent us from showing that "wearing earrings" should modify or attach to the subject NP. We have a constituent that should be attached to the NP, but now it's going to be elsewhere- --why, it's a discontinuous constituent! Here are a couple of examples of parse trees that we could come up with, but they fall short of what we need: S / | \ / | \ / | \ / | \ NP VP ?COMP? /\ /\ /\ / \ / \ / \ / \ / \ / \ ART NOUN VERB PP VERB NOUN | | | / \ | | a man walked PREP NP wearing earrings | | by [COMP] S / \ / \ / \ / \ NP VP /\ /| \ / \ / | \ / \ / | \ ART NOUN VERB PP ?COMP? | | | / \ /\ a man walked PREP NP / \ / / \ / VERB NOUN [COMP] | | wearing earrings Sigh. How are we going to deal with this? Just when it seems like we're gonna get some closure on this, your sadistic instructor introduces more shortcomings and pitfalls. When does it ever end? The problem revisited Here's a slightly different perspective on the problem. The whole point about syntax, at least from our point of view, is to extract generalities about a language based on word order. We can then use those generalities to constrain the computational effort we invest in deriving meaning. Naturally, given two different mechanisms for describing those generalities, we prefer the more economical one, all other things being equal. I mean, really, would you want the less economical one? Nah. One of the generalities embedded in phrase-structure grammars is that sentences with different structures have different meanings. This holds true even if the sentences look the same. The PP- attachment issue is a classic example: I saw the bird with the telescope. I saw the bird with the crest. These two sentences may look the same on the surface, but to get the obvious meaning of each, you need two entirely different structures. In fact, each of these sentences has a potential second meaning, depending on where you attach the prepositional phrase. On the other hand, there are sentences which seem, on the surface, to have entirely different structures, but have exactly the same meanings: John phoned up the woman. John phoned the woman up. Compare this to: Sarah walked up the stairs. *Sarah walked the stairs up. The difference is that the verb "phone" optionally takes the particle "up", while "walked" does not take this particle.) If we used a phrase-structure grammar to analyze the structures of these two sentences, we'd necessarily have two very different parse trees, but we'd have the same meaning. Intuitively, this should cause you some concern if you buy into the generality about structure corresponding to meaning. Chomsky! It certainly caused one linguist some concern. His name is Noam Chomsky, and he's more or less the godfather of modern linguistics. He's also a self-styled political scientist of some repute, specializing in the United States' involvement in the political affairs of other nations. (He doesn't talk much about linguistics anymore, but if you wanted him to speak about how the U.S. is doing all the wrong things in Central and South America, he'd probably be there in an instant.) Noting this contradiction resulting from his phrase-structure grammar idea (yup, that was his idea), Chomsky had the crucial insight that sentences have more than one level of structure. He proposed two levels, the deep structure and the surface structure. The deep structure reflects the meaning of the sentence (although it is not the same as the meaning of the sentence--a fact which too many people forget), while the surface structure reflects the phonological (what you hear) or orthographic (what you see) representation. Within this new syntactic framework, the derivation of a sentence's structure becomes a two-step process. The first step is what we're already familiar with: a phrase-structure grammar gets us from the "S node" to the deep structure, and each of these phrase-structure rules replaces one constituent with one or more constituents. The next step requires a set of transformation rules to get us from the deep structure to the surface structure. These rules, sometimes called transforms for short, map multiple constituents onto multiple constituents. The whole thing---the phrase-structure rules plus the transformation rules---is called a transformational grammar. For example, consider again the two sentences: John phoned up the woman. John phoned the woman up. We could derive the structures of these sentences using only a phrase-structure grammar by including two rules: VP <-- VERB PARTICLE NP VP <-- VERB NP PARTICLE Here, the two sentences will have two different structures. This approach does not reflect the similarity in meaning for the two sentences, which is a generality we want to maintain. However, we could assume that the two sentences have the same deep structure and that the constituents in the deep structure can be ordered in different ways to obtain different surface structures. We could construct a particle-movement transformation which would capture this assumption: VERB PARTICLE NP <-- VERB NP PARTICLE We can now apply the phrase-structure rules to derive the deep structure of the first sentence, which also turns out to be the surface structure of that same sentence (i.e., the surface structure you obtain by applying no transformations). We can also apply our transformation rule to that deep structure to derive the surface structure of the second sentence. By the way, the movement of the particle has nothing to do with the number of words it moves past, only with the nature of the constituent made up of those words: John phoned up the interesting woman. John phoned the interesting woman up. John phoned up the interesting woman with the curly hair. John phoned the interesting woman with the curly hair up. *John phoned the interesting woman up with the curly hair. John phoned the interesting woman up in the morning. *John phoned the interesting woman in the morning up. John phoned up the woman because he needed money. John phoned the woman up because he needed money. *John phoned the woman because he needed money up. This ability to move constituents is a pretty cool thing, because it gives a lot of power to our economical set of rules. Here's another example: Lyman played the tuba. The tuba was played by Lyman. This second sentence is known as a passive structure. (Writers are told to avoid this structure, because it is usually a less direct phrasing.) We could derive the structures of these sentences using only phrase-structure rules, but again we'd lose that generality about structure reflecting meaning. So we come up with another transformation, which says essentially that we can derive the surface structure of the passive form from the deep structure of the active form by swapping NPs and adding some stuff: NP1 VERB NP2 <-- NP2 be VERB -en (or -ed) by NP1 Other examples: The committee has rejected your application. Your application has been rejected (by the committee). The writer puts the statement in passive voice. The statement is put in passive voice (by the writer). A note on terminology--AUX and MODAL We've mentioned modal verbs, and you may also have come across the term auxiliary verbs (or aux). Both terms are used not-quite- interchangeably, especially in this course. Don't worry about that. What you should know is that auxiliary verbs are the verbs that carry negation information in a sentence. These verbs include can, could, should, shall, will, would, and three special verbs: be, have, and do. If a sentence contains a negative, it will contain an auxilliary verb of some sort; also, yes/no questions contain an auxiliary verb (which makes sense, because there's the possibility of negation). Auxiliary verbs are also used to express emphasis, time reference (_she will win_) and aspect (how an action is being accomplished, in a technical, verb sense: they are walking home, for example.) The Nature of Transformations Each transformation consists of one or more of three basic operations: movement of constituents, insertion of constituents, and deletion of constituents. Here's an example of the way in which a series of transformations can be applied to get a bunch of different surface structures: No major transform Active: Lyman played the tuba. (Notice: some transformations occurred, because past tense is in the correct place, and there's no AUX at the surface.) Passive transform Passive: The tuba was played by Lyman. Subj-aux inversion Passive yes/no question: Was the tuba played by Lyman? Negative insertion Negative passive yes/no question: Wasn't the tuba played by Lyman? Pronominalization or Pronoun/noun substitution Negative passive yes/no question with pronoun substitution for object: Wasn't it played by Lyman? NP deletion Negative passive yes/no question with pronoun substitution for object and subject deleted: Wasn't it played? Note that order of application of transformations is important. For example, you can start with: John called the woman up. and pronominalize it with a transformation: John called her up but you cannot now apply the particle-movement transform: *John called up her. Or, you could again start with: John called the woman up. and then apply the particle-movement transform: John called up the woman. But now you can't use the pronominalization transform: *John called up her. Also note that as far as the linguists are concerned, those six surface structures about Lyman and his tuba all reflect the same underlying semantics. Obviously, these sentences do not mean the same thing. That's why we say deep structure is _reflective_ of meaning, but it is not the same as meaning. It is not the case that sentences with the same deep structure mean the same thing. Similarly, when you substitute a synonym for some word, your new sentence doesn't mean quite the same thing. The whole reason you choose one synonym over another; it is also the whole reason you choose one surface structure over another. Relative clauses: A relative clause is formed when one clause is embedded into an NP of another clause to produce structures like the following: I sent your book to my aunt. My aunt lives in Dublin. I sent your book to [my aunt ^who lives in Dublin.^] I sent your book to a certain author, one of several that we have been talking about. The author is famously unkempt. I sent your book to [the author ^who is famously unkempt.^] Relative clauses in English are introduced with a relative pronoun, such as who, whom, whose, which, or that. In relative clauses, the pronoun can be deleted in some circumstances (this happens someplace between the deep structure and the surface structure). For example: This is the officer I talked to last night. One of the interesting things about the relative clause construction is that, although deletion of the introducing relative pronoun is allowed, it can really mess up parsing, at least for humans. This certainly happens in some of the garden path sentence constructions we talked about in class. Implementing this Stuff With things like passive transformations, subject-aux inversion, and particle movement, the movement of the constituents is highly constrained. The particle can only be in one of two places, the subject-aux inversion can only go one way, and so on. This kind of movement has a name of its own, local movement, and it's relatively easy to modify your ATN parser to deal with these things by adding a few additional states, arcs, and augmentations. Yes, you are trampling on that generality we were trying to preserve, but let's face reality--you're trying to get this thing to work. There are some transformations, however, that make our lives miserable. For example, consider the assertion: Kurt put a trick question in the midterm exam. We can transform that into What did Kurt put __ in the midterm exam? The VP here at first looks like it will be "put in the midterm exam," but we know that's not an acceptable VP, because we know put needs a direct object. But the object NP has been replaced by the appropriate wh-form and moved to the front of the sentence to derive the wh-question. All wh-questions are derived by replacing some constituent with a wh-form and moving it to the front of the sentence regardless of how many constituents are in the way; this type of movement is called unbounded movement. Back to the example. The place where we expected the object NP, just after "put", is called a gap. The wh-form is the filler. What does our ATN parser need to be able to do with this question? Reading from left to right, the parser needs to recognize that it's looking at a wh-question ("What" is a dead giveaway), store the filler, continue parsing along, recognize where the gap is, shove the filler back in there, and then finish up so that the sentence that has been parsed, for all intents and purposes, is: Kurt put what in the midterm exam? To do this, your ATN would need the ability to recognize fillers and gaps, and it would need a new data structure called a hold list, and a new action called the hold action. The gory details are in the book. You don't have to implement this in your parser. The Question of Context-Sensitivity If you haven't noticed by now, those transformational rules are context sensitive. Doesn't this mean that English is necessarily not context free? No. Remember that all along we've been saying "we could do this with phrase-structure rules, but we choose not to..." So the fact that transformational grammar adds context sensitivity doesn't necessarily mean that English (or any other natural language) can't be context free--it only means that English is easier to describe as a context-sensitive grammar. Universal Grammar Theory Part and parcel of early transformational grammar theory is some possible insight into human language acquisition. Chomsky and his cohorts argued that any given language must be hard to learn, grammar-wise, especially since the "training data" is relatively small, inconsistent, and mostly devoid of negative feedback (in the form of "near misses"...remember that term from your AI class?). Tranformational grammar, they argued, offers a simpler model of grammar to learn---that is, you'd only have to learn two small grammars, say of size N (for a total of size 2 x N), instead of one big grammar on the order of size N x N. But how do you still learn those two small grammars in your early years? Chomsky said that children must be born with a predisposition toward certain general lingusitic principles---an innate universal grammar shared by all babies. As a child is exposed to the language used daily around him or her, some principles are reinforced while others are discarded. The idea of babies being predisposed toward learning language isn't controversial, but the notion of a universal grammar is most definitely controversial. Nobody has really found any significant evidence in support of the universal grammar idea, and Chomsky's never been really specific on what that grammar might be anyway. Nevertheless, there's certainly some appeal to the notion of a universal grammar, and it does lead to some interesting speculations about where language comes from and what limits might exist. For example, Chomsky has argued that the universal grammar that might be encoded in the infant brain is a product of the biology---that is, the neural hardware, or wetware, of the human brain determines the universal grammar. Thus, Chomsky argues, the languages that we are able to acquire are constrained by the "wiring" in our brains. As a result, we should not be surprised if, given the opportunity to communicate with alien life forms (e.g., little green men (LGMs) or dolphins), we could never understand their languages. Why? Because our brains may not allow us to learn the grammars of the alien languages, which themselves are constrained by the (presumably) different neural organization of the alien brains. Fun to think about, huh? The Derivational Theory of Complexity Tranformational grammar has also had some significant influence on the study of the psychology of language. Some of the earliest work on the psychology of language resulted from transformational grammar ideas. In the 1960s, many folks believed that TG was directly related to psychological performance, and that things like deep structure were useful in predicting performance. Linguists didn't necessarily push TG as "psychologically real," but maybe they didn't make themselves as clear as they could have. Or maybe they waffled. Or maybe psychologists didn't understand. In any case, the psychologists put TG to the test. These psychologists (Fodor, Bever, and Garrett, in various author orders, from 1968 through the mid-1970's) pursued the possibility that purely linguistic concepts might serve as the basis of a psychological model. For example, deep structures could serve as a representational system and transforms as the processing system. One famous, now infamous, model that came out of this work is called the derivational theory of complexity, or DTC. The premise here was that a sentence's psychological complexity, and therefore its processing difficulty, was related to, and even predicted by, the number of transforms in a sentence's derivation. Early studies showed that negative sentences such as The sun is not shining. were more difficult to comprehend than the corresponding affirmative The sun is shining. But because these sentences have different meanings as well as different transformational complexities (which may be related but are not necessarily the same thing), not to mention different word counts, this result isn't anything you can trade in for a Nobel prize. Later studies contradicted the DTC idea. For example, although The boy was bitten. is linguistically more complex than The boy was bitten by the man. no experiment revealed a relationship to processing difficulty. (Why is the latter more complex than the former? The former requires a deletion transformation.) As another example, this sentence needs a particle-movement transformation: John phoned the girl up. but this one doesn't: John phoned up the girl. but experiments detected no difficulty in processing difference. It became evident that syntactic characteristics that had a big impact on differences in meaning, such as negation, had vastly different psychological effects than did semantically irrelevant rules like particle movement. In other words, the psychologists soon decided that semantics had a bigger impact on psychological models of language than did syntax. Now, if you look at the linguistics work since about 1972, when DTC's limitations had been established, onething you'll see is that there was a flurry of changes to theories of transformational grammar. As the new versions of transformational grammar were developed, DTC had to be tested again and again. But none of these newer forms of transformational grammars have ever been able to show greater processing for more transformations. So, in the late '60's and early '70's, linguistics and psycholinguistics divorced. What happened? Why did DTC go wrong? What happened is that linguists worry about matters of language competence---how to describe what people must know in the most economical way possible. Psychologists (as well as most AI folks), on the other hand, worry about language performance---how do people use what they know about language in its comprehension and production, and how to describe that in the most economical way possible. However, there's no reason to believe that a linguistic theory motivated by an economy of representation is necessarily relevant to a psychological theory motivated by an economy of processing. The theorists invested in an assumption that just wasn't justified. In other words, there's no reason, either now or then, to believe that transformational rules bear any relationship to mental operations. They might, but that would be amazing, not expected. Now since then, two additional things have happened to the study of language. One has to do with linguistics: some linguists parted with the notion of transformational grammar and tried to develop other grammars that were more psychologically realistic. The two most successful of these are Case Grammar, originally developed by Fillmore, and Lexical Functional Grammar (LFG), originally developed by Bresnan and Kaplan. Both of these grammars focus far more on developing a structure that marks aspects of the meaning more clearly. In Transformational Grammar, for example, you have a plain old Noun Phrase, and have to use your knowledge of the surrounding structure to tell you whether this noun phrase is the subject or the object or the indirect object, or whatever. In Case Grammar and LFG, though, such relationships are actually part of the grammar. Now, all three grammars have their problems, but in my opinion, the biggest reasons that TG is more successful than these others is that it had a headstart, and it had better PR. The point for you to realize is that the grammar that you've encoded in your ATNs is only one description of the structure of language, and it is limited in certain ways in which other grammars are less limited. As a result of adopting this grammar, which neglects certain generalities about meaning and structure, you will have to put in more work at later steps in connecting meaning to the structure. The other thing that happened as a result of the split between linguistics and psychology is that the field of psycholinguistics flowered. The focus of psycholinguistics is the understanding of the performance issues to which we've referred at various times. This may seem like a subtle shift, but in fact it's very important. Transformational grammar theorists have a very strong belief that language is a special type of cognition, that evolved pretty much on its own, with its own idiosyncrasies that have little connection to other aspects of cognition. Psycholinguistics, on the other hand, assumes that language is an instance of general rules of cognition, and that what is easy to do in language, or what is hard to do in language, is reflective of the strengths and weaknesses of the organization of cognition in general. Now this is where we return to relative clauses. With relative pronouns deleted. I showed you a couple of sentences in which the relative pronoun was deleted, and the sentence was still grammatical. And I have to admit, there are some circumstances in which it is ungrammatical to delete a relative pronoun introducing a relative clause. Now we come to my favorite sentence, the one that changed my life: The horse raced past the barn fell. I assure you, this is a grammatical sentence! It should be read as if the phrase "which was" were (legally) deleted between horse and raced. (Think about a context in which we were describing two horses, one racing up past the house, and the other racing down by the barn.) How do I know this is grammatical? Consider the sentence: The car driven through the fence plummeted down the cliff. Various theories of why this is hard for humans to understand have been posited. Some are based on a formal theory of the human parsing mechanism: for example, a deterministic, no- backtracking parser would get in trouble with this sentence. But we know humans can backtrack: remember The orange ducks sentence? That is not hard to understand, and yet it probably involves at least quasi-deterministic parsing. We'll go into some more detail on other parsing heuristics in the next lecture. Some theories are based on an analysis of generalities of human cognition. For example, humans have a working memory of only about seven + or - 2) items. You can remember some connected items as a chunk (for example, you tend to remember phone numbers as 6 chunks: area code, prefix, and the four final digits.) Phrases and clauses allow you to "chunk" information like this. But if you don't come to the end of the phrase, you can't create a chunk, so you have to remember all the items individually. This also runs into problems: The prime number few. The correct parse tree of this sentence looks like this: The one that humans seem to attempt is this: [Editorial note: the ASCII diagrams that Jen provided to me years ago were totally munged. She never fixed them. Sigh. -- Kurt] S *S / \ / \ / \ / \ / \ / \ / \ / \ NP VP NP ----VP / \ / \ / | \ \ | / \ / \ / | \ \ | / \ / \ / | \ \ %#!% ART NOUN VERB ADVP ART ADJP N COMP | | | | | | | / \ the prime number ADV the ADJ number / \ | | / \ few prime [that] S / \ / \ NP VP / \ | / \ | QUANTP N %#!% / | / | QUANTIFIER %#!% | few The thing to notice is that the number of non-terminal nodes in the correct parse is far fewer than the number of non-terminal nodes in the incorrect parse, depicted on the left. (Non- terminal nodes are any nodes that have additional nodes under them; in a parse tree, the terminal nodes are the words, so in the left parse tree, we're looking at 8 non-terminal nodes, and in the right parse tree, we are looking at 14 non-terminal nodes.) If you were doing a nice depth-first, pre-order traversal, you'd be able to get to the terminal nodes, and thus close out your recursive calls, much earlier with the left-hand tree than the right-hand tree. Why, then, does the human sentence processing mechanism seem to assume the much more complicated structure as it encounters the elements of this sentence, and why can't it recover easily (as it does with "the orange ducks" sentence)? You see, there isn't a simple, memory- management-driven heuristic that says: "Just pick the tree with the fewest non-terminal nodes." Previously, we looked at several specific heuristics for deterministic processing, and we found that they all have problems that are as fatal as this one. The moral of the story is that there are many ways to describe the language, some of which focus on the generalities of the grammar, some of which focus on the generalities of meaning, some of which focus on the generalities of human cognition. The task of Artificial Intelligence folks is to develop a system that captures some generalities of processing from a computational perspective, but keeping in mind that it has to capture the generalities of processing from a human cognitive perspective. Copyright (c) 2004 by Kurt Eiselt and Jennifer Holbrook. All rights reserved, except as previously noted.
Last revised: March 30, 2004