Thad Starner   

Professor | Contextual Computing Group | College of Computing | Georgia Institute of Technology
Interfaces for Augmenting
Face-to-Face Conversation >>
  • Mobile Text Entry
  • Dual-Purpose Speech
  • Augmenting Conversation between
    the Deaf and Hearing Community


  • Gesture Recognition &
    Activity Discovery >>
  • Sign Language
  • Activity
  • Gesture


  • Previous Work >>
  • Face & Handwriting Recognition
  • Augmented Reality & Virtual Reality
  • Power & Heat
  • Agents & Ubiquitous Computing
  • Miscellaneous

  • Mobile Text Entry


    Twiddler One-Handed Keyboard

    Mini-QWERTY Keyboards


    Over 1 trillion wireless messages are typed each year, and more e-mail is being sent via mobile phone than home PC in some countries. Yet current typing methods on mobile phone keypads such as Multitap and T9 are slow, averaging 8 to 20 words per minute (wpm) for experts. We are investigating text entry methods that enable typing at 50-130wpm, which is equivalent to highly-skilled desktop typing rates. We have performed longitudinal studies on the Twiddler one-handed chording keyboard, the multi-tap method implemented on the Twiddler (for comparison), and two mini-QWERTY (thumb) keyboards. In addition to issues of typing speeds, error rates, and learnability, our research investigates viable methods of text entry when the user may have limited visual feedback, such as when walking or engaged in a face-to-face conversation.

    The Twiddler is a one-handed chording mobile keyboard that employs a 3x4 button layout, similar to that of a standard mobile telephone. Despite its seeming applicability to the mobile market and use by the wearable computing community, there has been very little data on the Twiddler's performance and learnability. In our longitudinal study comparing novice users' learning rates on the Twiddler versus multi-tap, we found that multi-tap users' maximum speed averaged 20wpm while Twiddler users averaged 47wpm. One user averaged 67wpm. We analyze the effects of learning on various aspects of chording and provide evidence that lack of visual feedback does not hinder expert typing speed. Such "blind" typing" situations are common during face-to-face conversations, classroom lectures, or business meetings. We examine the potential use of multi-character chords (e.g. pressing the g and h keys for to produce "ing ") to increase text entry speed (Thad has bursted up to 130wpm on certain phrases while testing the experimental software). Finally, we explore improving novice typing rates on the Twiddler through use of a chording tutorial and create a prototype design of a mobile phone that could use the Twiddler's typing method.

    In our longitudinal study of mini-QWERTY keyboards, beginning users who are already expert at desktop keyboards type at approximately 30wpm. With practice, these typists average 60wpm. However, in the blind condition, our subjects peaked at 45wpm with much higher error rates than in the normal mini-QWERTY condition or in any of the blind Twiddler typing experiments. We analyze the types of errors made by mini-QWERTY typists, suggest methods of improving accuracy, and use our experimental results to update the current theoretical model of the maximum expected typing rates for a mini-QWERTY keyboard.

    We conclude that desktop typing rates are possible on small mobile devices. This empirical result suggests that desktop-style computing services may be supportable on current mobile phone form factors. In selecting a typing method for the design of new device we can offer the following suggestions. The mini-QWERTY keyboard should be considered if the user is expected to already be expert at desktop QWERTY typing and is expected to be able to use both hands and visually concentrate on the keyboard while typing. If the user is learning to type for the first time, if the device does not have physical space for more than a 12 key numberpad, or if the user is expected to use the device in "hands-limited" scenarios, then the Twiddler style of chording should be considered. We also suspect, but have not shown, that the Twiddler would enable typing and error rates superior to those of the mini-QWERTY keyboard while the user is walking or otherwise mobile. If the Twiddler style of text entry is chosen for the design of a new device, we suggest the inclusion of a built-in tutor, such as our Twidor Java software, to encourage novice typists by demonstrating that they can achieve fast typing rates quickly.



    Dual-Purpose Speech


    Calendar Navigator Agent (CNA)

    CNA Interface


    Speech is often considered the "ultimate" interface for mobilecomputing. However, speech interfaces are socially inappropriate in many situations. For example, if a user accessed his calendar during a meeting by saying "Computer: please display my schedule for next week," the utterance would interrupt the flow of the conversation and seem awkward. We are investigating a different approach. We are designing speech interfaces that "listen-in" and allow the user to directly control the interface with key phrases that are appropriate in the context of the conversation (e.g. "Can we meet sometime next week?"). We call this approach "Dual-Purpose Speech."

    Our first example of Dual-Purpose Speech, the Calendar Navigator Agent (CNA), is motivated by our study on mobile calendar device usage from CHI2004.

    This study suggests that low access time is a major contributor to the usability of mobile scheduling devices. Therefore, the CNA was designed to use the scheduling conversation itself to navigate the user's calendar instead of requiring the user take the time to retrieve his PDA or day planner and manually enter the data.

    When a CNA user says to a colleague, "Can we meet sometime next week," the CNA shows the user's calendar on the user's head-up display and places the calendar at next week. As the conversation continues (e.g. "How about next Tuesday?", "Is 1pm OK?", etc.) the system continues to navigate the user's calendar and enters an appointment when appropriate. The user limits his vocabulary and grammar when negotiating the appointment with his conversational partner so that the CNA can understand the context and specifics being discussed. To protect privacy and obey state laws, only the wearer's side of the conversation is used. Even so, the CNA is not designed to hide the user's interaction from the user's colleague. Instead it is designed to facilitate the conversation by reducing the time and distraction necessary to retrieve a scheduling device and physically manipulate it.

    In addition to the CNA, we have made two other dual-purpose speech prototypes: Speech Courier and Dialog Tabs. Both of these prototypes were guided by observations made during the mobile calendar device usage study mentioned above. Often, users would temporarily store appointments on scrap paper or in their memory until they could have more time to access their PDA, day planner, or home PC. Dialog Tabs assists this "buffering" of information by capturing sections of conversation for later access. Whenever the user triggers a dialog tab, the next few seconds of speech are stored and a small icon appears on the right hand side of the user's screen. Speech recognition is attempted using the user's voice model, and the resulting transcription is stored with the audio. Later, when the user clicks on the dialog's icon, both the audio file (represented by a standard audio player interface) and the transcription appear. The user can click words that are thought to be incorrect on the transcription to hear the audio at that point. By providing this audio-linked transcription, the user can remind himself of the contents of the dialog tab even if the speech recognition was error-prone.

    In the case that the computer understands the contents of a dialog tab to be an implicit command, it will perform preliminary processing on the utterance and associate the result with the icon as well. For example, if the user clicks on a dialog tab that contains a past conversation about scheduling, the calendar interface from the CNA may appear showing the time and date understood from parsing the speech transcription.

    Speech Courier, our third prototype, can be considered a special case of a dialog tab. In studying busy managers, we found that they would sometimes issue instructions to their assistants through a conversation with a colleague. For example, one of our managers had a phone conversation in his office which ended in "Yes Mr. Smith, my assistant Susan will send you those 108SC forms today." The assistant Susan, who was in the manager's office at the time, took this utterance to be an implicit instruction to retrieve the forms and mail them to Mr. Smith. Speech Courier allows a similar interaction when the manager is separated from his assistant in space or time. If Speech Courier was used in the above case, the manager would trigger a dialog tab and store the appropriate utterance. The key phrase "my assistant Susan" would indicate that this dialog tab should be directed to the Speech Courier interface, which would encode both the audio file and the attempted transcription in an e-mail and address the e-mail to Susan. The manager can confirm the correctness of the resulting e-mail on his head-up display and send it with a button press. Susan may then respond to the implicit request even though she was at a different location when the utterance was spoken.



    Augmenting Conversation between the Deaf and Hearing Community


    Telesign ASL Translator

    CopyCat ASL Game


    Telesign attempts to create a mobile American Sign Language (ASL) to English translator that can aid face-to-face communication between a hearing and deaf person when an interpreter is not feasible (for example, after a car accident, transferring between gates at an airport, searching for an apartment, etc.). The project is inspired by phrase books designed for travelers visiting countries where they do not know the local language. The traveler searches the phrase book for a given scenario (for example, asking for the nearest restroom) and speaks the appropriate phrase using a phonetic transcription. The phrase is designed to elicit non-verbal gestures from the traveler's conversation partner (e.g. "Can you point in the direction of the nearest restroom?") so that the traveler can understand the response. Telesign provides support for similar scenarios but automates the process of finding the correct English phrase. The user signs the phrase she wants translated, and the system tracks the signer's hands and recognizes the ASL. The system determines which pre-programmed English phrases the sign most closely matches and presents a list to the user in a head-up display. The signer selects the English phrase he desires, and the system speaks the phrase aloud.

    Telesign uses a combination of accelerometers embedded in wrist bracelets and a camera in a hat to track the signer's hands. With the current system, the signer uses the jog wheel from a small computer mouse attached to one of the accelerometer bracelets to control the interface. When the user wants to sign a phrase, she presses the jog wheel button, and the system begins capturing data from the camera and the accelerometers. When the phrase is finished, the signer presses the job wheel again, and the recognizer processes the captured data. We use our hidden Markov model-based Georgia Tech Gesture Toolkit (GT2K) to recognize the signed phrase. We have recently demonstrated a phrase-level continous recognizer for the system that uses a vocabulary of 78 words in conjunction with the phrases it recognizes. Once the sign is recognized, the system presents the user with candidate translations in English. In the future we may use sign icons depending on the signer's level of familiarity with written English.

    American Sign Language (ASL) is a distinct language from English, involving a significantly different grammar and lexicon. ASL is also the preferred language of the deaf community, as it is as fast as spoken language and conveys many nuances that are critical in communication. Yet few hearing people sign, and often the deaf must communicate with hearing people through slow, hand-written notes in English, which is a second language for them. We are beginning studies to compare the effectiveness of communicating between a hearing and a deaf person using a phrase level translator, handwritten notes, and typing on a small PDA.

    90% of deaf children are born to hearing parents who do not know sign language or have low levels of proficiency. Unlike hearing children of English-speaking parents or deaf children of signing parents, these children often lack the serendipitous access to language at home which is necessary in developing linguistic skills during the "critical period" of language development. Often these children's only exposure to language is from signing at school. CopyCat is a game that uses our sign language recognition system to augment early classroom teaching for developing American Sign Language (ASL) skills in young deaf children.

    CopyCat is designed both as a platform to collect gesture data for our ASL recognition system and as a practical application which helps deaf children acquire language skills while they play the game. The system uses a video camera and wrist mounted accelerometers as the primary sensors. In CopyCat, the user and the character of the game, Iris the cat, communicate with ASL. With the help of ASL linguists and educators, the game is designed with a limited, age-appropriate phrase set. For example, the child will sign to Iris, "you go play balloon" (glossed from ASL). If the child signs poorly, Iris looks puzzled, and the child is encouraged to attempt the phrase again. If the child signs clearly, Iris frolics and plays with a red balloon. If the child cannot remember the correct phrase to direct Iris, she can click on a button bearing the picture of the object with which she would like Iris to play. The system shows a short video with a teacher demonstrating the correct ASL phrase. The child can then mimic the teacher to communicate with Iris.

    Gesture-based interaction expands the possibilities for deaf educational technology by allowing children to interact with the computer in their native language. An initial goal of the system, suggested by our partners at the Atlanta Area School for the Deaf, is to elicit phrases which involve three and four signs from children who normally sign in phrases with one or two signs. This task encourages more complex sign construction and helps develop short term memory. In the current game there are 8 phrases per level, and the child must correctly sign each phrase before moving on to the next level.

    To date, CopyCat has used a "Wizard of Oz" approach where an interpreter simulates the computer recognizer. This method allows research into the development of an appropriate game interface as well as data collection to train our hidden Markov model (HMM) based ASL recognition system. Preliminary off-line tests of the recognition system have shown promising results for user-independent recognition of data from our ten-year-old subjects, and we hope to perform experiments with a live recognition system soon. In addition, our pilot studies have allowed us to create a compelling game for the students, who often ask to continue playing even after they have completed all levels of the game.


    Related Project

    Telesign American Sign Language(ASL) Translator
    CopyCat ASL Tutor

    Publication

    Media & Press