Not the real language course

Transformational Grammar


Phrase structure grammars
 

The grammars we've been looking at so far are called "phrase
structure grammars"---they emphasize the tree-like structuring of
phrases and sentences. They also make an assumption about the
locality of grammar rules in that each rule deals with a single
node (this is where the context-freeness aspect comes into play).
But this locality assumption leads to a big problem: we want our
grammar, and the sentences generated by the grammar, to reflect
the generality in the language, and this is hard to do with
phrase structure grammars when we encounter discontinuous
constituents. What's a discontinuous constituent? Here's an
illuminating example---consider the sentence "A man wearing
earrings walked by." Here's a parse tree for the sentence,
constructed through the application of a phrase structure
grammar:
 
                    S
                  /   \
                /       \
              /           \
            /               \
          NP                 VP
         /|\                 / \
       /  |  \             /     \
     /    |    \         /         \
   ART  NOUN  COMP     VERB        PP
    |     |     /\      |          /\
               /  \               P  NP
    a   man   /    \   walked     /   \
             /      \            by  [LOC]
           VERB    NOUN
             |       |
 
         wearing  earrings
 
This works out the way we'd expect, with "wearing earrings"
modifying the NP "a man". But now look at a variation of this
same sentence: "A man walked by wearing earrings." We surely know
that "wearing earrings" tells us about "a man", not about how or
where the man walked. In other words, this sentence means the
same thing as the one before, yet our phrase structure grammar is
going to prevent us from showing that "wearing earrings" should
modify or attach to the subject NP. We have a constituent that
should be attached to the NP, but now it's going to be elsewhere-
--why, it's a discontinuous constituent! Here are a couple of
examples of parse trees that we could come up with, but they fall
short of what we need:
 
                          S
                        / | \
                     /    |    \
                  /       |       \
               /          |          \
             NP          VP          ?COMP?
            /\           /\            /\
           /  \         /  \          /  \
          /    \       /    \        /    \
        ART   NOUN  VERB    PP     VERB  NOUN
         |     |      |     /  \     |     |
         a    man  walked PREP NP  wearing earrings
                       |   |
                           by [COMP]
 
 
 
 
                          S
                        /   \
                     /         \
                  /               \
               /                     \
             NP                      VP
            /\                      /| \
           /  \                   /  |   \
          /    \                /    |      \
        ART   NOUN            VERB   PP     ?COMP?
         |     |               |     / \      /\
         a    man           walked PREP NP   /  \
                                        /   /    \
                                       /   VERB   NOUN
                                    [COMP]   |      |
                                          wearing earrings
 
Sigh. How are we going to deal with this? Just when it seems like
we're gonna get some closure on this, your sadistic instructor
introduces more shortcomings and pitfalls. When does it ever end?
 
 
The problem revisited
 
Here's a slightly different perspective on the problem. The whole
point about syntax, at least from our point of view, is to
extract generalities about a language based on word order. We can
then use those generalities to constrain the computational effort
we invest in deriving meaning. Naturally, given two different
mechanisms for describing those generalities, we prefer the more
economical one, all other things being equal. I mean, really,
would you want the less economical one? Nah.
 
One of the generalities embedded in phrase-structure grammars is
that sentences with different structures have different meanings.
This holds true even if the sentences look the same. The PP-
attachment issue is a classic example:
 
     I saw the bird with the telescope.
     I saw the bird with the crest.
 
These two sentences may look the same on the surface, but to get
the obvious meaning of each, you need two entirely different
structures. In fact, each of these sentences has a potential
second meaning, depending on where you attach the prepositional
phrase.
 
On the other hand, there are sentences which seem, on the
surface, to have entirely different structures, but have exactly
the same meanings:
 
     John phoned up the woman.
 
     John phoned the woman up.
 
Compare this to:
 
     Sarah walked up the stairs.
 
     *Sarah walked the stairs up.
 
The difference is that the verb "phone" optionally takes the
particle "up", while "walked" does not take this particle.)
If we used a phrase-structure grammar to analyze the structures
of these two sentences, we'd necessarily have two very different
parse trees, but we'd have the same meaning. Intuitively, this
should cause you some concern if you buy into the generality
about structure corresponding to meaning.
 
 
Chomsky!
 
It certainly caused one linguist some concern. His name is Noam
Chomsky, and he's more or less the godfather of modern
linguistics. He's also a self-styled political scientist of some
repute, specializing in the United States' involvement in the
political affairs of other nations. (He doesn't talk much about
linguistics anymore, but if you wanted him to speak about how the
U.S. is doing all the wrong things in Central and South America,
he'd probably be there in an instant.)
 
Noting this contradiction resulting from his phrase-structure
grammar idea (yup, that was his idea), Chomsky had the crucial
insight that sentences have more than one level of structure. He
proposed two levels, the deep structure and the surface
structure. The deep structure reflects the meaning of the
sentence (although it is not the same as the meaning of the
sentence--a fact which too many people forget), while the surface
structure reflects the phonological (what you hear) or
orthographic (what you see) representation.
 
Within this new syntactic framework, the derivation of a
sentence's structure becomes a two-step process. The first step
is what we're already familiar with: a phrase-structure grammar
gets us from the "S node" to the deep structure, and each of
these phrase-structure rules replaces one constituent with one or
more constituents. The next step requires a set of transformation
rules to get us from the deep structure to the surface structure.
These rules, sometimes called transforms for short, map multiple
constituents onto multiple constituents. The whole thing---the
phrase-structure rules plus the transformation rules---is called
a transformational grammar. For example, consider again the two
sentences:
 
     John phoned up the woman.
 
     John phoned the woman up.
 
We could derive the structures of these sentences using only a
phrase-structure grammar by including two rules:
 
     VP <-- VERB PARTICLE NP
 
     VP <-- VERB NP PARTICLE
 
Here, the two sentences will have two different structures. This
approach does not reflect the similarity in meaning for the two
sentences, which is a generality we want to maintain.
However, we could assume that the two sentences have the same
deep structure and that the constituents in the deep structure
can be ordered in different ways to obtain different surface
structures. We could construct a particle-movement transformation
which would capture this assumption:
 
     VERB PARTICLE NP <-- VERB NP PARTICLE
 
We can now apply the phrase-structure rules to derive the deep
structure of the first sentence, which also turns out to be the
surface structure of that same sentence (i.e., the surface
structure you obtain by applying no transformations). We can also
apply our transformation rule to that deep structure to derive
the surface structure of the second sentence.
 
By the way, the movement of the particle has nothing to do with
the number of words it moves past, only with the nature of the
constituent made up of those words:
 
     John phoned up the interesting woman.
     John phoned the interesting woman up.
     John phoned up the interesting woman with the curly hair.
     John phoned the interesting woman with the curly hair up.
    *John phoned the interesting woman up with the curly hair.
     John phoned the interesting woman up in the morning.
    *John phoned the interesting woman in the morning up.
     John phoned up the woman because he needed money.
     John phoned the woman up because he needed money.
    *John phoned the woman because he needed money up.
 
This ability to move constituents is a pretty cool thing, because
it gives a lot of power to our economical set of rules. Here's
another example:
 
     Lyman played the tuba.
 
     The tuba was played by Lyman.
 
This second sentence is known as a passive structure. (Writers
are told to avoid this structure, because it is usually a less
direct phrasing.)  We could derive the structures of these
sentences using only phrase-structure rules, but again we'd lose
that generality about structure reflecting meaning.  So we come up
with another transformation, which says essentially that we can
derive the surface structure of the passive form from the deep
structure of the active form by swapping NPs and adding some
stuff:
 
     NP1 VERB NP2 <-- NP2 be VERB -en (or -ed) by NP1
 
Other examples:
 
      The committee has rejected your application.
 
      Your application has been rejected (by the committee).
 
      The writer puts the statement in passive voice.
 
      The statement is put in passive voice (by the writer).
 
 
A note on terminology--AUX and MODAL
 
We've mentioned modal verbs, and you may also have come across
the term auxiliary verbs (or aux).  Both terms are used not-quite-
interchangeably, especially in this course.  Don't worry about
that.  What you should know is that auxiliary verbs are the verbs
that carry negation information in a sentence.  These verbs
include can, could, should, shall, will, would, and three special
verbs: be, have, and do.  If a sentence contains a negative, it
will contain an auxilliary verb of some sort; also, yes/no
questions contain an auxiliary verb (which makes sense, because
there's the possibility of negation).  Auxiliary verbs are also
used to express emphasis, time reference (_she will win_) and
aspect (how an action is being accomplished, in a technical, verb
sense:  they are walking home, for example.)
 

The Nature of Transformations
 
Each transformation consists of one or more of three basic
operations: movement of constituents, insertion of constituents,
and deletion of constituents.  Here's an example of the way in
which a series of transformations can be applied to get a bunch
of different surface structures:
 
         No major transform
 
  Active:                    Lyman played the tuba.
 
(Notice: some transformations occurred, because past tense is in
the correct place, and there's no AUX at the surface.)
 
         Passive transform
 
  Passive:                   The tuba was played by Lyman.
 
         Subj-aux inversion
 
  Passive yes/no question:   Was the tuba played by Lyman?
 
         Negative insertion
 
  Negative passive yes/no question:
 
                             Wasn't the tuba played by Lyman?
 
         Pronominalization or Pronoun/noun substitution
 
  Negative passive yes/no question with pronoun substitution for
  object:
                             Wasn't it played by Lyman?
 
         NP deletion
 
  Negative passive yes/no question with pronoun substitution for
  object and subject deleted:
 
                          Wasn't it played?
 
Note that order of application of transformations is important.
For example, you can start with:
 
     John called the woman up.
 
and pronominalize it with a transformation:
 
     John called her up
 
but you cannot now apply the particle-movement transform:
 
    *John called up her.
 
Or, you could again start with:
 
     John called the woman up.
 
and then apply the particle-movement transform:
 
     John called up the woman.
 
But now you can't use the pronominalization transform:
 
    *John called up her.
 
Also note that as far as the linguists are concerned, those six
surface structures about Lyman and his tuba all reflect the same
underlying semantics. Obviously, these sentences do not mean the
same thing. That's why we say deep structure is _reflective_ of
meaning, but it is not the same as meaning.  It is not the case
that sentences with the same deep structure mean the same thing.
Similarly, when you substitute a synonym for some word, your new
sentence doesn't mean quite the same thing. The whole reason you
choose one synonym over another; it is also the whole reason you
choose one surface structure over another.
 
Relative clauses:  A relative clause is formed when one clause is
embedded into an NP of another clause to produce structures like
the following:
 
  I sent your book to my aunt.  My aunt lives in Dublin.
 
  I sent your book to [my aunt ^who lives in Dublin.^]
 
  I sent your book to a certain author, one of several that we have
  been talking about.  The author is famously unkempt.
 
  I sent your book to [the author ^who is famously unkempt.^]
 
Relative clauses in English are introduced with a relative
pronoun, such as who, whom, whose, which, or that.
In relative clauses, the pronoun can be deleted in some
circumstances (this happens someplace between the deep structure
and the surface structure).  For example:
 
           This is the officer I talked to last night.
 
One of the interesting things about the relative clause
construction is that, although deletion of the introducing
relative pronoun is allowed, it can really mess up parsing, at
least for humans.  This certainly happens in some of the garden
path sentence constructions we talked about in class.
 
 
Implementing this Stuff
 
With things like passive transformations, subject-aux inversion,
and particle movement, the movement of the constituents is highly
constrained. The particle can only be in one of two places, the
subject-aux inversion can only go one way, and so on. This kind
of movement has a name of its own, local movement, and it's
relatively easy to modify your ATN parser to deal with these
things by adding a few additional states, arcs, and
augmentations. Yes, you are trampling on that generality we were
trying to preserve, but let's face reality--you're trying to get
this thing to work.
 
There are some transformations, however, that make our lives
miserable. For example, consider the assertion:
 
     Kurt put a trick question in the midterm exam.
 
We can transform that into
 
     What did Kurt put __ in the midterm exam?
 
The VP here at first looks like it will be "put in the midterm
exam," but we know that's not an acceptable VP, because we know
put needs a direct object. But the object NP has been replaced by
the appropriate wh-form and moved to the front of the sentence to
derive the wh-question. All wh-questions are derived by replacing
some constituent with a wh-form and moving it to the front of the
sentence regardless of how many constituents are in the way; this
type of movement is called unbounded movement.
 
Back to the example. The place where we expected the object NP,
just after "put", is called a gap. The wh-form is the filler.
What does our ATN parser need to be able to do with this
question? Reading from left to right, the parser needs to
recognize that it's looking at a wh-question ("What" is a dead
giveaway), store the filler, continue parsing along, recognize
where the gap is, shove the filler back in there, and then finish
up so that the sentence that has been parsed, for all intents and
purposes, is:
 
    Kurt put what in the midterm exam?
 
To do this, your ATN would need the ability to recognize fillers
and gaps, and it would need a new data structure called a hold
list, and a new action called the hold action. The gory details
are in the book. You don't have to implement this in your parser.
 
 
The Question of Context-Sensitivity
 
If you haven't noticed by now, those transformational rules are
context sensitive.  Doesn't this mean that English is necessarily
not context free?  No.  Remember that all along we've been saying
"we could do this with phrase-structure rules, but we choose not
to..."  So the fact that transformational grammar adds context
sensitivity doesn't necessarily mean that English (or any other
natural language) can't be context free--it only means that
English is easier to describe as a context-sensitive grammar.
 
 
Universal Grammar Theory
 
Part and parcel of early transformational grammar theory is some
possible insight into human language acquisition.  Chomsky and his
cohorts argued that any given language must be hard to learn,
grammar-wise, especially since the "training data" is relatively
small, inconsistent, and mostly devoid of negative feedback (in
the form of "near misses"...remember that term from your AI
class?).  Tranformational grammar, they argued, offers a simpler
model of grammar to learn---that is, you'd only have to learn two
small grammars, say of size N (for a total of size 2 x N),
instead of one big grammar on the order of size N x N.
 
But how do you still learn those two small grammars in your early
years? Chomsky said that children must be born with a
predisposition toward certain general lingusitic principles---an
innate universal grammar shared by all babies. As a child is
exposed to the language used daily around him or her, some
principles are reinforced while others are discarded.
 
The idea of babies being predisposed toward learning language
isn't controversial, but the notion of a universal grammar is
most definitely controversial. Nobody has really found any
significant evidence in support of the universal grammar idea,
and Chomsky's never been really specific on what that grammar
might be anyway.
 
Nevertheless, there's certainly some appeal to the notion of a
universal grammar, and it does lead to some interesting
speculations about where language comes from and what limits
might exist. For example, Chomsky has argued that the universal
grammar that might be encoded in the infant brain is a product of
the biology---that is, the neural hardware, or wetware, of the
human brain determines the universal grammar. Thus, Chomsky
argues, the languages that we are able to acquire are constrained
by the "wiring" in our brains. As a result, we should not be
surprised if, given the opportunity to communicate with alien
life forms (e.g., little green men (LGMs) or dolphins), we could
never understand their languages.  Why?  Because our brains may not
allow us to learn the grammars of the alien languages, which
themselves are constrained by the (presumably) different neural
organization of the alien brains. Fun to think about, huh?
 
 
The Derivational Theory of Complexity
 
Tranformational grammar has also had some significant influence
on the study of the psychology of language. Some of the earliest
work on the psychology of language resulted from transformational
grammar ideas. In the 1960s, many folks believed that TG was
directly related to psychological performance, and that things
like deep structure were useful in predicting performance.
Linguists didn't necessarily push TG as "psychologically real,"
but maybe they didn't make themselves as clear as they could
have. Or maybe they waffled. Or maybe psychologists didn't
understand. In any case, the psychologists put TG to the test.
These psychologists (Fodor, Bever, and Garrett, in various author
orders, from 1968 through the mid-1970's) pursued the possibility
that purely linguistic concepts might serve as the basis of a
psychological model. For example, deep structures could serve as
a representational system and transforms as the processing
system.
 
One famous, now infamous, model that came out of this work is
called the derivational theory of complexity, or DTC. The premise
here was that a sentence's psychological complexity, and
therefore its processing difficulty, was related to, and even
predicted by, the number of transforms in a sentence's
derivation.
 
Early studies showed that negative sentences such as
 
     The sun is not shining.
 
were more difficult to comprehend than the corresponding
affirmative
 
     The sun is shining.
 
But because these sentences have different meanings as well as
different transformational complexities (which may be related but
are not necessarily the same thing), not to mention different
word counts, this result isn't anything you can trade in for a
Nobel prize.
 
Later studies contradicted the DTC idea. For example, although
 
     The boy was bitten.
 
is linguistically more complex than
 
     The boy was bitten by the man.
 
no experiment revealed a relationship to processing difficulty.
(Why is the latter more complex than the former? The former
requires a deletion transformation.)
 
As another example, this sentence needs a particle-movement
transformation:
 
     John phoned the girl up.
 
but this one doesn't:
 
     John phoned up the girl.
 
but experiments detected no difficulty in processing difference.
 
It became evident that syntactic characteristics that had a big
impact on differences in meaning, such as negation, had vastly
different psychological effects than did semantically irrelevant
rules like particle movement. In other words, the psychologists
soon decided that semantics had a bigger impact on psychological
models of language than did syntax. Now, if you look at the
linguistics work since about 1972, when DTC's limitations had
been established, onething you'll see is that there was a flurry
of changes to theories of transformational grammar. As the new
versions of transformational grammar were developed, DTC had to
be tested again and again. But none of these newer forms of
transformational grammars have ever been able to show greater
processing for more transformations. So, in the late '60's and
early '70's, linguistics and psycholinguistics divorced.
 
 
What happened? Why did DTC go wrong?
 
What happened is that linguists worry about matters of language
competence---how to describe what people must know in the most
economical way possible. Psychologists (as well as most AI
folks), on the other hand, worry about language performance---how
do people use what they know about language in its comprehension
and production, and how to describe that in the most economical
way possible. However, there's no reason to believe that a
linguistic theory motivated by an economy of representation is
necessarily relevant to a psychological theory motivated by an
economy of processing. The theorists invested in an assumption
that just wasn't justified. In other words, there's no reason,
either now or then, to believe that transformational rules bear
any relationship to mental operations. They might, but that would
be amazing, not expected.
 
Now since then, two additional things have happened to the study
of language. One has to do with linguistics: some linguists
parted with the notion of transformational grammar and tried to
develop other grammars that were more psychologically realistic.
The two most successful of these are Case Grammar, originally
developed by Fillmore, and Lexical Functional Grammar (LFG),
originally developed by Bresnan and Kaplan. Both of these
grammars focus far more on developing a structure that marks
aspects of the meaning more clearly. In Transformational Grammar,
for example, you have a plain old Noun Phrase, and have to use
your knowledge of the surrounding structure to tell you whether
this noun phrase is the subject or the object or the indirect
object, or whatever. In Case Grammar and LFG, though, such
relationships are actually part of the grammar. Now, all three
grammars have their problems, but in my opinion, the biggest
reasons that TG is more successful than these others is that it
had a headstart, and it had better PR. The point for you to
realize is that the grammar that you've encoded in your ATNs is
only one description of the structure of language, and it is
limited in certain ways in which other grammars are less limited.
As a result of adopting this grammar, which neglects certain
generalities about meaning and structure, you will have to put in
more work at later steps in connecting meaning to the structure.
The other thing that happened as a result of the split between
linguistics and psychology is that the field of psycholinguistics
flowered. The focus of psycholinguistics is the understanding of
the performance issues to which we've referred at various times.
This may seem like a subtle shift, but in fact it's very
important. Transformational grammar theorists have a very strong
belief that language is a special type of cognition, that evolved
pretty much on its own, with its own idiosyncrasies that have
little connection to other aspects of cognition.

Psycholinguistics, on the other hand, assumes that language is an
instance of general rules of cognition, and that what is easy to
do in language, or what is hard to do in language, is reflective
of the strengths and weaknesses of the organization of cognition
in general.
 
Now this is where we return to relative clauses.  With relative
pronouns deleted.   I showed you a couple of sentences in which
the relative pronoun was deleted, and the sentence was still
grammatical.  And I have to admit, there are some circumstances
in which it is ungrammatical to delete a relative pronoun
introducing a relative clause.  Now we come to my favorite
sentence, the one that changed my life:
 
The horse raced past the barn fell.
 
I assure you, this is a grammatical sentence!  It should be read
as if the phrase "which was" were (legally) deleted between horse
and raced.  (Think about a context in which we were describing
two horses, one racing up past the house, and the other racing
down by the barn.)
 
How do I know this is grammatical?  Consider the sentence:
The car driven through the fence plummeted down the cliff.
Various theories of why this is hard for humans to understand
have been posited.   Some are based on a formal theory  of the
human parsing mechanism:  for example, a deterministic, no-
backtracking parser would get in trouble with this sentence.  But
we know humans can backtrack:  remember The orange ducks
sentence?  That is not hard to understand, and yet it probably
involves at least quasi-deterministic parsing.  We'll go into
some more detail on other parsing heuristics in the next lecture.
Some theories are based on an analysis of generalities of human
cognition.  For example, humans have a working memory of only
about seven + or - 2) items.  You can remember some connected
items as a chunk (for example, you tend to remember phone numbers
as 6 chunks:  area code, prefix, and the four final digits.)
Phrases and clauses allow you to "chunk" information like this.
But if you don't come to the end of the phrase, you can't create
a chunk, so you have to remember all the items individually.
 
This also runs into problems:
 
  The prime number few.
 
The correct parse tree of this sentence looks like this:
 
The one that humans seem to attempt is this:
 
[Editorial note:  the ASCII diagrams that Jen provided to me 
years ago were totally munged.  She never fixed them.  Sigh.
-- Kurt]
 
                   S                                        *S
                 /   \                                    /   \
               /       \                                /       \
              /          \                             /        \
            /              \                          /           \
          NP                 VP                       NP  ----VP
         / \                 / \                    /  |  \    \
|
       /     \             /     \                /    |   \
\     |
     /         \         /         \             /     |    \
\   %#!%
   ART             NOUN     VERB        ADVP         ART   ADJP
N   COMP
    |          |         |           |           |     |    |
/ \
   the       prime     number       ADV        the    ADJ number
/   \
                                     |                 |        /
\
                                    few               prime
[that]   S
 
/  \
 
/    \
 
NP    VP
 
/   \    |
 
/     \   |
 
QUANTP   N  %#!%
                                                              /
|
                                                             /
|
 
QUANTIFIER   %#!%
                                                             |
                                                            few
 
 
The thing to notice is that the number of non-terminal nodes in
the correct parse is far fewer than the number of non-terminal
nodes in the incorrect parse, depicted on the left.  (Non-
terminal nodes are any nodes that have additional nodes under
them; in a parse tree, the terminal nodes are the words, so in
the left parse tree, we're looking at 8 non-terminal nodes, and
in the right parse tree, we are looking at 14 non-terminal
nodes.)  If you were doing a nice depth-first, pre-order
traversal, you'd be able to get to the terminal nodes, and thus
close out your recursive calls, much earlier with the left-hand
tree than the right-hand tree.  Why, then, does the human
sentence processing mechanism seem to assume the much more
complicated structure as it encounters the elements of this
sentence, and why can't it recover easily (as it does with "the
orange ducks" sentence)? You see, there isn't a simple, memory-
management-driven heuristic that says: "Just pick the tree with
the fewest non-terminal nodes."
 
Previously, we looked at several specific
heuristics for deterministic processing, and we found that they
all have problems that are as fatal as this one.  
The moral of the story is that there are many ways to describe
the language, some of which focus on the generalities of the
grammar, some of which focus on the generalities of meaning, some
of which focus on the generalities of human cognition.  The task
of Artificial Intelligence folks is to develop a system that
captures some generalities of processing from a computational
perspective, but keeping in mind that it has to capture the
generalities of processing from a human cognitive perspective.
 
 
 
Copyright (c) 2004 by Kurt Eiselt and Jennifer Holbrook. All rights 
reserved, except as previously noted.

Last revised: March 30, 2004