A Workspace for Programming using Novel Input Modalities

Jennifer Mankoff

Introduction

Compared to communication between humans or between computers, human to computer communication, or input, is quite slow. Most approaches to speeding it up put the burden on the human (eg. touch-typing). This project investigates an alternative: Using the computer's power to ease the user's burden.

Current input systems are often slow and error prone (eg. handwriting or speech recognition systems) or cumbersome and unportable (eg. any reasonably ergonomic keyboard). In addition, they may cause injuries in their users, and generally are most effective only for users with no physical disabilities affecting their arms. As long as computer use is a fact of daily life for many professionals, these problems will limit computer accessibility.

This project will focus on a constrained input task: programming. Programming requires the production of machine-readable documents (usually ASCII text) with an extremely constrained syntax. These constraints will make the input problem more manageable by reducing the set of possible words the user may input, and the types of input tasks the user will engage in. Anecdotal evidence suggests that programmers engage in three main types of input tasks: text entry , modification, and navigation.

Background

My research into computing in the home (Mankoff97a,Mankoff98a,Mankoff98b) highlighted the inadequacy of computer interfaces for supporting natural input. Although this project falls into an area in which more traditional interfaces have been somewhat successful, they are certainly not any more natural. Most existing systems assume that using a keyboard and chording (pressing multiple keys at once) is no problem for the user and provide little or no support for new modalities. One exception to this is the small body of programs which allow coding using speech recognition (Such as DragonDictate in combination with a2x and VoiceCode).

The User

Although the "Empirical Studies of Programmers" workshop has been convening for seven years now \cite{ESP}, there is very little literature which focuses on the types of input tasks facing the programmer. Instead, researchers focus on cognitive process such as how people learn to program and what mental structures they use to represent programs \cite{Solo84,Broo80,Petr97} or workflow processes and software engineering practices used in the programming task. In general, the literature assumes that input is not a major problem for programmers.

The Task

The programming task is extremely complex from the programmer's perspective. However, when we consider the input aspects only, we find that it breaks down into three main subtasks: Text entry, text modification, and navigation. Text entry is the phase in which the programmer enters "words" or "sentences" of source code into the computer. Text modification involves editing individual, previously existing, words or lines of code. Navigation involves finding the right place in the right document to enter or modify code, and moving there.

Text Entry

The task of entering text quickly is an issue for most computer users. This includes non-speaking people, many of whom suffer from additional disabilities which make it impossible for them to type. Since conversational speech occurs at a rate of 100-150 words per minute, this is a significant handicap. Several methods for speeding up text entry include iconic languages \cite{Stron95}, word prediction \cite{Gree95,Gara94}, and sentence completion \cite{McCo94}. Although these systems are targeted at English speech, many of the techniques developed could be transferred to the programming task, and might even work better because of its constraints.

Recognition-based modalities such as speech and handwriting can also be used for text entry. However, most of these modalities are not built to handle the syntax of programming languages (which require punctuation characters in specific locations interspersed with non-English words). They often automatically infer spacing and punctuation, usually incorrectly for a programming setting. One solution is to give words new semantics (eg. "while" might translate to "while() {}").

Another problem with recognition technologies is that the relative frequencies of characters and words changes in programming. This is difficult for end-users to fix, although it can be overcome in some cases, given access to the back end of the recognition system.

Text Modification

Users rarely enter stand-alone words during text modification. Instead, they may change punctuation, or replace one word with a different one. This causes problems for word-based recognition and for prediction systems. However, extrapolation and "action" prediction such as that described by Masui \cite{Masu94} can better support this phase. For example, the system might predict that since you added a ");" at the end of the previous line, you plan to do the same thing on the next line. This is often called "programming by demonstration" \cite{Fuji98,Cyph91,Masu94}.

Navigation

Navigation and text modification are very closely tied together. Modification usually involves making small, isolated changes in an iterative fashion. Navigation is needed to move to the place where the user plans to make changes. Currently, navigation is done either with the help of a scroll bar and by typing in the name of specific files, or by searching within or among files for a specific keyword. Some environments such as Emacs tags or Visual C++ provide alternative indices into a program based on syntactic or semantic information such as function names or an object hierarchy.

Approach

The first two sections of this document have identified some approaches to user input for programming based on past work and anecdotal evidence. In order to support the programming task, because a repetitive strain injury has made it difficult for me to use a keyboard, I have created an exploratory environment which allows experimentation with novel input modalities. This environment is in real use on a daily basis. I am using it to bootstrap its own creation.

Computer Environment

In order to explore input possibilities, both a hardware and a software infrastructure must be implemented. I am currently in the process of designing and building this infrastructure.

Hardware

In order to handle user input, the computer must be equipped with hardware that can sense user actions. From the user's perspective, this hardware is an important part of the user interface. Ideally, it should be ergonomic, and impose logical constraints on the user's actions. It needs to be an integrated part of the user's work environment.

For a programmer, the work environment is usually a desk, monitor, keyboard, and mouse. I have replaced the keyboard with a pen tablet, the monitor and the desk with a drafting table on which a display is projected, and the mouse with a foot pedal. I plan to embed a touch screen in the drafting table. In addition, I will use vision to recognize simple gestures and track physical objects.

Software

With such a large variety of input sources, a concerted plan for handling the data they produce is important. In the case of recognition-based technologies such as vision and pen gestures, it is even more essential that input data from all sources is used: this may help to reduce the recognition error rates. In addition, the variety of hardware in use makes it possible to experiment with many different approaches to input. The software supports this by allowing rapid prototyping of new input devices.

The word "finished" written on a circular, word-level soft keyboard.

I have used this infrastructure to quickly create and test a group of pen based input devices including a software mouse and a novel, circular soft keyboard (We plan to file a patent on this keyboard and are currently working on an article to submit for publication). The keyboard is shown in Figure~1. The user has written the word "finished" on the keyboard. The system only checks when the pen enters and leaves a letter, but we hope to show that the user will begin to associate gestures with words or parts of words. Both the mouse and keyboard are used on a daily basis by myself to write email and other text with low amounts of punctuation. I plan to integrate gesture recognition with the system in order to support a full character set.

Although the infrastructure as described so far will work with any applications running in UNIX under XWindows, we can provide much more powerful and less error-prone interaction by building or customizing a program editor to work more closely with our input infrastructure.

In addition to communicating important context information to the input software, the editor will support raw data from the new input modalities being used. Although the final goal of the editing process is an ASCII program, there is a definite need for the system to handle other types of data: the system should not require the user to translate anything except the program itself into ASCII. Examples of data the user may wish to leave in non-ASCII formats include pen and voice annotations, comments, and figures or diagrams.

Conclusion

There is a communications bottleneck between humans and computers. Although touch typists in non-mobile environments can communicate at acceptable speeds (about 1/2 the speed of speech), changes in modality or environment usually cause a rapid degradation in speed.

This research investigates new interaction techniques for input which may improve communication speed. Although we have focused specifically on programming, the techniques I have begun to develop will have wide-ranging applications. First, the tasks of entry, modification, and navigation are applicable to most document creation processes, not just programming. Second, although I am focusing on a programmer in a non-mobile environment who is unable to use a keyboard (myself), the techniques we are developing might be of use in many other settings. Examples include: Pen-input machines; mobile environments, users with more severe physical disabilities (ie. non-speaking people); and users who have writing related disabilities (eg. people who have trouble spelling, dyslexia, etc).


jmankoff@cc.gatech.edu