Acoustical Awareness


Intelligent Robotic Action

By Eric Martinson

Overview | Robotic Discovery of the Auditory Scene |Auditory Evidence Grids | Source Directivity Estimation | Noise Contour Maps | Information Kiosk




Audition on mobile robots has long been passed over in favor of vision, the argument being that if we could only decipher an image, then vision has all of the data necessary for highly successful navigation.  But the proponents of audition have been successfully reversing this trend in recent years by arguing that there is a wealth of information available to the robot outside the narrow confines of a camera’s view-space.  If nothing else, the omni-directionality of incoming acoustic information can be used to direct more data-rich directional sensors to intriguing or suspicious locations.  Beyond that, researchers are also adding microphones to augment human robot interfaces, beef up security, localize themselves, and a variety of other applications.


The problems with microphones on robots, however, are significant.  Where traditional microphone mountings not on a robot have often been subject to environmental interference ranging from ambient noise, to high and low frequency echoes, and overlapping sound sources, robots add their own set of problems to the list.  Many of the techniques developed to counter these problems, including filters, do work on mobile robots, but they are not as successful when the platform: (1) moves around the environment, changing its proximity to different sources; (2) generates its own noises, wheel and motor, which vary with the executed action; and (3) has limited computational and power resources, but needs to process the data in real time.  These inherent problems of mobile robotics, combined with the general problems associated with using microphones, produce daunting obstacles confronting the developers of acoustic applications for these platforms.


Mobile robotics though has unique advantages all its own, which have not been exploited in the traditional signal processing community.  The key advantage is that robots can move.  They are not limited to a single location as are traditional microphone mountings, nor are they powerless over to where they will be moved as with wearable computers.  If there is a better location for performing their task, they can navigate to that location under their own power.  Furthermore, we are not limited to a single robot.  Robot teams add extra dimensions of control, by allowing fully dynamic microphone arrays that are not limited by a rigid internal structure, nor stuck in randomly distributed locations.  The potential that mobility alone adds to acoustical applications is enormous, but we first need to figure out how to best exploit that potential.


In this work, it is our supposition that acoustical awareness is the key to successful development of mobile robotic applications involving sound.  Acoustical awareness is defined here as the coupling of action with knowledge about the acoustic environment, where said knowledge could be in the form of maps, rules, measurements, predictions, or anything that indicates how sound flows or will flow through the environment.  The underlying premise is that the more acoustical knowledge the robot uses, the better its global performance will be on an acoustic application.



Robotic Discovery of the Auditory Scene


The goal of this work is to build a robot that can autonomously explore the soundscape, and discover knowledge that will allow it to enhance other auditory applications.  This work currently consists of three parts: (1) discovering source locations with the auditory evidence grid; (2) building models of source directivity; and (3) estimating the volume of noise due to these sound sources found throughout the environment.


This work was performed in part at the Navy Center for Applied Research in Artificial Intelligence, in cooperation with Alan Schultz.


Auditory Evidence Grids

Microphone arrays mounted on mobile robots cannot typically localize sources in two dimensions.  The closeness of the microphones limits their accuracy to finding angular measurements only.


By moving the robot and recording its position over time, we can combine multiple angular estimates together to triangulate upon active source positions in the environment.  Based on the evidence grid representation, auditory evidence grids can localize one or more sources in the environment using as few as two microphones.


Estimating Source Directivity

Once a source position has been identified, a robot can now sample from a wide range of positions around the source to both improve its localization of the source, and identify the directivity of the source with respect to angle


Noise Contour Maps

The robot is now able to identify three pieces of information about the auditory scene.  It can identify source location, volume, and directivity.  Using the idea of spherical spreading from the source, we can build noise contour maps, predicting how loud different parts of the environment should be to the robot.  These can then be used to guide a robot to quieter areas, improving its signal-to-noise ratio.




Eric Martinson and Alan Schultz, Robotic Discovery of the Auditory Scene, to be published in the Proceedings of the International Conference on Robotics and Automation (ICRA), Rome, Italy, April 10-14, 2007


Eric Martinson and Alan Schultz, Auditory Evidence Grids, Proceedings of the International Conference on Intelligent Robots and Systems (IROS), Beijing, China, Oct 9-15, 2006


Eric Martinson and Ronald C. Arkin, Noise Maps for Acoustically Sensitive Navigation, Proc. Of SPIE, vol. 5609, Oct. 2004



Information Kiosk


Effective communication with a mobile robot using speech is a difficult problem even when you can control the auditory scene.  Robot ego-noise, echoes, and human interference are all common sources of decreased intelligibility.  In real-world environments, however, these common problems are supplemented with many different types of background noise sources.  For instance, military scenarios might be punctuated by high decibel plane noise and bursts from weaponry that mask parts of the speech output from the robot.  Even in non-military settings, however, fans, computers, alarms, and transportation noise can cause enough interference that they might render a traditional speech interface unintelligible.  In this work, we seek to overcome these problems by applying robotic advantages of sensing and mobility to a text-to-speech interface.  Using perspective taking skills to predict how the human user is being affected by new sound sources, a robot can adjust its speaking patterns and/or reposition itself within the environment to limit the negative impact on intelligibility, making a speech interface easier to use.


This work was tested entirely at the Navy Center for Applied Research in Artificial Intelligence (NRL) in cooperation with Derek Brock.


The B21r (located at the Navy Center for Applied Research in Artificial Intelligence) uses adapts to the surrounding auditory scene by:


·        Rotating to face a human user

·        Increasing/ Decreasing Volume

·        Pausing during periods of excessive noise

·        Moving to another less noisy location







Eric Martinson and Derek Brock, "Improving Human-Robot Interaction through Adaptation to the Auditory Scene", Proceedings of the 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI), Washington, DC, Mar 9-11, 2007


Derek Brock and Eric Martinson, "Exploring the Utility of Giving Robots Auditory Perspective-Taking Abilities", Proceedings of the International Conference on Auditory Display, London, UK, June 20-23, 2006


Eric Martinson and Derek Brock, Auditory Perspective Taking, Proceedings of the 1st ACM/IEEE International Conference on Human-Robot Interaction (HRI), Salt Lake City, UT, Mar 2-4, 2006