Verbs can be grouped into categories of primitive actions to help
with lexical analysis.
Example: Eve ate the apple.
| Eve | noun | human, proper name |
| ate | verb | INGEST |
| the | determinant | |
| apple | noun | fruit, juicy, food, edible |
The parser first does a small amount of syntactic analysis from left to right to find the first verb.
| actor | slot 0 |
| object | slot 1 |
| to body part (part of slot 0) | |
| from | slot 2 |
Now that we have the verb, processing begins again at the front of the sentence. "Eve" is put into the concept list, and we try to satisfy requests of the verb as soon as possible. INGEST's requests are:
Only req0 has been satisfied (by "Eve") so far, so we begin to process the rest of the sentence. "Apple" is found to be able to fill req1, but req2 remains unsatisfied since there's nothing left in the sentence.
Basically, verbs are used as keys to extract models from memory so top-down processing can occur.
There's a great amount of disagreement as to how many primitives there are, but one theory suggests the following:
The theory says that this is a set of language-independent primitives in which everyone thinks. To put it another way, this is the grammer of the language of thought.
Some words can fall into multiple categories. How do we resolve the amibiguity?
The difference here is when more processing is done: before or after the primitive is picked.
What if there are two verbs? (i.e. - I imagined he ran.) Ans: Break up the sentence.
| MBUILD |
|
|||||
| PTRANS |
|
In the context of a general theory of frames, the number of slots isn't fixed, but in the context of a theory of natural language processing, the number is fixed.
| actor | from |
| object | time |
| recipient/beneficiary | location |
| to body part | instrument |
This theory appears attractive because it says a lot about top-down processing, but it's unattractive because the question of learning is left open.
Models for Vision
After labeling of an image occurs, the labels can be used to get models from long-term memory. The models must allow for two things:
One model used that meets these requirements is the model of generalized
cylinders, where a human would be represented by something like the
following:

(Lewd drawing by Asok omitted.)
So processing on an image progresses from the raw image to lines to labeled surfaces. From there, the labels are used to retrieve models from long-term memory to a short-term memory so a full 3D representation of the image can be constructed.
Retrieval of a model is based on partial pattern recognition. If the correct
model is retrieved from memory, expectations can be drawn even if the entire
object isn't visible. Expectations and zooming can be helpful for
disambiguation.