Virtual Environment Sound Design Primer

Jarrell Pair (

-Virtual Environments Group


Graphics, Visualization and Usability Center

College of Computing

Georgia Institute of Technology



What is Sound?

Sound can be formally defined as a series of compressions and rarefactions propagating through the air. This wave comes in contact with the outer ear, or pinnae, where it enters the ear canal. Via the eardrum, this wave is transmitted to the small bones often referred to as the hammer, stirrup and anvil. These small bones mechanically vary the fluid pressure within the cochlea. This pressure bends hair cells called cilia. The bending of the cilia stimulates neurons completing the transduction of the physical sound wave into an electrical signal. Next, the brain interprets this electrical signal as a specific sound through the neurological process of aural perception.

How do we Localize Sound in Three Dimensional Space?

The seminal work conducted by Lord Rayleigh provides the duplex theory of audio localization. It explains audio localization as a function of differences in intensity and arrival time between sounds reaching the ears. Interaural intensity difference (IID) refers to the difference in intensity of a sound detected at each ear. This cue is generally considered ineffective for frequencies below 1500 Hz. At these low frequencies, sound waves wrap around the head minimizing intensity differences. At frequencies above 3000 Hz, intensity differences are significant enough to act as a cue to determine the source sound’s position. The difference in phase or time of sound waves reaching each ear is designated as the interaural time difference (ITD). Unlike the IID, this cue is effective for low frequency signals. Due to the fact that identical IID and ITD cues can be generated for multiple points in space, it is necessary for individuals to rotate their heads to accurately localize a sound.

The outer ear, or pinnae, also plays a key role in localizing sound. When sound comes in contact with the pinnae, its frequency characteristics are modified. These modifications vary depending on the position of the sound source thus providing an important directional cue. This cue helps compensate for the limitations of the ITD and IID cues.

The pinnae localization cues can be represented mathematically by the head related transfer function (HRTF). The HRTF represents the spectral or frequency component filtering that occurs to a sound as it travels from outside the head into the ear canal. 3D sound computer systems apply HRTF sets to digital audio files to generate the illusion of a sound originating from a point in 3D space.

The brain perceives distance as a function of the relative amount of reverberant or reflected sound to the amount of direct or unreflected sound received by the ear. This quantity is expressed as the reverberant/direct (R/D) ratio.

Another important aspect of human audio perception is exhibited when a sound is being played simultaneously from two speakers. The sound is perceived as coming from a location in between the two speakers. However, if the sound from the right speaker is delayed by 70 milliseconds or less, the sound will be perceived as emanating from the left speaker. This psychoacoustic phenomenon is referred to as the Haas or precedence effect.

Audio Playback

Stereo or two channel systems playback sound through speakers positioned to the left and right of the listener. Moving sound effects can be approximated by panning a sound from the left speaker to the right speaker or vice versa.

Surround sound systems extend spatialization capabilities by adding additional speakers. In addition to the stereo left and right speaker channels, a center and rear channel is added.

Neither stereo nor surround sound systems can accurately present spatialized sound sources. Over loudspeakers. it is difficult to transmit HRTF filtered sound independently to each ear. Consequently, 3D sound is optimally reproduced when an individual is wearing headphones.

Digital Audio Files

A sound wave is represented digitally as a series of discrete samples taken over a time interval. These samples represent the changing voltage levels of an analog sound source. The accuracy of these samples is determined by the bit resolution. For example, 8-bit resolution allows the representation of 28 or 256 voltage levels. However, 16-bit resolution provides the ability to represent 216 or 65535 different voltage levels. Clearly, 16-bit resolution facilitates a more accurate reproduction of the original sound source. The samples taken per second or sampling rate determines how precisely the digital sound file reproduces the frequency components of the sound source. The Nyquist formula states that the sampling rate must be twice the highest frequency sampled. If this condition is not met, the sound cannot be accurately reproduced. Therefore, because the human range of hearing is generally considered to extend from 20Hz to 20khz, 44.1khz, the sampling rate for compact discs, is capable of accurately reproducing any sound the human ear can detect. The use of sampling rates and resolutions lower than 44.1khz 16-bit can result in a loss of quality. However, file sizes are significantly reduced. A stereo sound file recorded at 44.1khz 16-bit requires approximately 10 megabytes of disk space. A mono 22.5 kHz 8-bit file requires close to 2.5 megabytes, a four-fold reduction in storage requirements. Nevertheless, the loss in quality can often overshadow the benefits of disk space conservation.

Why Use Audio?

Designers of virtual environments seek to create a feeling of presence, or immersion for users. Primarily, this goal has been pursued by creating worlds with convincing 3D interactive graphics. This sense of presence is achieved by chiefly engaging the human sense of sight. Unlike sight, the sense of hearing is often neglected in the implementation of a virtual world. Regardless of considerable evidence on its immersive potential, audio is often banished as the poor stepchild of virtual reality. This trend is in part due to technical resource limitations of computer systems. Designers were forced to sacrifice audio quality for graphics performance. However, these restrictions no longer exist. It is now possible to implement high fidelity, immersive audio in graphically intensive virtual environments.

For several decades, filmmakers have used sound as a key element for cinematic storytelling. Sound can be used in film and virtual environments to compensate for the lack of other sensory modalities. For example, in a car chase scene, by exaggerating the sound of the engine and the tires squealing, the audience can effectively experience the intensity of being inside the car itself. Perceptually, we can almost feel the bumps in the road and the impact of a crash. By raising the low frequency or bass components of the audio, the sound designer can make the audience feel tactile sensations in a theater equipped with subwoofer speakers.

Sound Design Principles

For virtual environment sound designers, a great deal can be learned by studying techniques used in the film industry. However, there is a fundamental difference between sound in film and sound in virtual worlds. Film sound is linear, it does not change. In a virtual world, characteristics of the environment can vary in real time. A dynamic soundscape must be created that can respond appropriately to any change caused by the user or the environment itself.

When planning the soundscape for a virtual environment it is necessary to categorize the sounds as either foley or ambient effects. Foley effects include both predictable and user triggered sounds such as doors opening and footsteps. Ambient sounds are looped background sounds that are used to create a sense of atmosphere. The sounds of birds chirping and wind blowing is an example of an ambience that could be used in a forest environment. Effects tied to moving objects should be spatialized depending on the capabilities of the playback platform.

Experienced sound designers for film and television are often asked by directors to use high quality sound content to compensate for low quality visuals. Therefore, do not downsample or reduce audio quality unless it is absolutely necessary to maintain the environment’s performance. If there is a choice between slightly reducing graphic quality or reducing audio fidelity, downgrade the graphics. Reductions in audio fidelity are more easily noticed by users than reductions in graphic quality. Furthermore, it is always advisable to begin with source material recorded and sampled at the highest fidelity levels possible.

Highly realistic sound effects are often inadequate. For example, when creating the sound of an explosion, film sound designers will exaggerate the intensity and duration of the actual sound to increase its emotional impact upon the audience. Though sound cannot be heard in space, few would argue that the space battle scenes in Star Wars would be effective without sound effects. In other words, sound effects must exaggerate reality to create an immersive experience.

It is important to remember that the ear is more sensitive to high frequencies (1000Hz-5000Hz) and less sensitive to low frequencies. At a constant global volume level, a high frequency sound effect will be perceived as louder than an equally important low frequency effect. Consequently, for bass sounds to be apparent, their volume must be maximized


Virtual environments are created to immerse users in order that they may experience a suspension of disbelief. Though it is not currently feasible, designers would ideally have the ability to provide realistic inputs to the senses of sight, smell, touch, taste, and hearing. Regardless, by effectively exploiting the use of sound, 3D graphical worlds can be imbued with a degree of vitality and intensity unachievable with visuals alone.

Further Reading

Begault, Durand R. 3-D Sound for Virtual Reality and Multimedia. New York: AP
Professional, 1994.

Cohen, M., and Wenzel., E. M. The Design of Multidimensional Sound Interfaces. In W. Barfield and     T.  Furness III, editors, Virtual Environments and Advanced Interface Design. Oxford University         Press, New York, New York, 1995.

Moore, Brian C.J., An Introduction to the Psychology of Hearing. New York: AP Academic Press, 1997.