Interacting with a scene in the real word causes multifaceted attractions. In addition to visual and haptic stimulation the acoustic stimulation is considered to be a very important issue for the natural perception of simulated worlds. You gain information about the environment, about external occurrences and in particular feedback about the way oneself acts. In order to achieve this, all human sensory-systems have to be stimulated in a natural way. For the acoustic part of the scene several aspects must be considered. If an object collides with an other one, a membrane of a loudspeaker vibrates or air flows through a small jet, sound is radiated and receives its characteristic by the generating system. The sound is radiated through the air from the source to the ear of the listener, possibly reflected from the walls. All acoustic properties like level, sound coloration, the different angles and time alignments of the direct sound and reflections will be evaluated by the auditory system of the human being in the specific scene. According to all these cues in the structure of the impulse response one can draw conclusions from the environment’s size and shape. The complete system can be identified as a serial sequence of three subsystems – sound generation, propagation to the listener and finally the auditory perception which is a topic for investigation in the field of psychoacoustics. In particular psychoacoustics defines the quality standard of the system’s individual components. In addition a system for the creation of a virtual reality must provide a correct sound reproduction of the simulated acoustical signal at the ears of a listener.
Modell of an acoustical Virtual Reality
Due to the fact that humans hear with two ears, a direction can be assigned to sound events. As for visual stimuli, the brain compares pictures from both eyes to determine the objects’ placing in a scene and with this information, it creates a three-dimensional cognitive representation that humans perceive as a three dimensional image. In straight analogy, stimuli that are present at the eardrums will be compared by the brain to determine the nature and the direction of a sound event. Depending on the horizontal angle of incidence, different time delays and levels between both ears consequently arise. In addition, frequency characteristics dependent on the angle of incidence are influenced by the interference between the direct signal and the reflections of head, shoulder, auricle and other parts of the human body. The interaction of these three factors permits humans to assign a direction to acoustic events. All these characteristics of the sound pressure at the eardrum are used in binaural technology. If the transfer functions to the ears are measured either individually with small in-ear microphones or with an artificial head, a source can be placed virtually at any position around the listener using a binaural synthesis algorithm on a computer. In this case the positioning of a source is not limited to the space between the loudspeakers as it is when using the conventional panning method. Furthermore the simulation of sources behind the listener is also possible.
For the rendering of a three-dimensional sound field, we prefer the use of binaural signals, which determine the effective sound pressure at the eardrum of the listener, to a reproduction of the sound field in space (Dolby Surround, Ambisonic, Wave Field Synthesis). In contrast to other loudspeaker-based reproduction systems that are, e.g., based on Intensity-Panning, the advantage of the binaural approach is the ability to reproduce even near to head acoustic sources realistically. Here, the interaural level and time delay differences for the same direction are higher than those at a longer distance. Particularly when the user causes a sound event while interacting with his hands in a virtual scene, the distance between source and head is generally smaller than a meter. A remote multi-channel loudspeaker system cannot simulate this effect. Instead of headphones, loudspeakers can also be used to render binaural signals if the crosstalk between the loudspeaker and the ear turned away from the loudspeaker is suppressed. Starting point of our development is the well-known cross-talk compensation developed by Atal and Schröder. By connecting a head tracker, a static cross-talk cancellation can be expanded to a dynamic, adaptive system for a moving listener.
In his investigations Gardner pointed out the general applicability of adapting the “sweet spot” – the point in space which the cross-talk compensation has been calibrated for – to the listener’s position. A quality comparable to a headphone reproduction can only be achieved if the system reacts to the user movements fast enough, i.e. before the user leaves a certain tolerance range around the “sweet spot”. In the context of our ongoing research activities, hearing tests have been performed to determine the dimensions of the “sweet spot”. Over the past years intensive research was done at the Institute of Technical Acoustics in order to develop qualified algorithms and methods for an applicable dynamic cross-talk compensation system. The result is a purely software-based implementation for a dynamic cross-talk compensation which is able to provide any binaural signal at the ears of the user acting in the virtual scene. The movement of the user is limited by the coverage of the head tracking system, yet unrestricted movement in every orientation is possible.
Within the scope of the cooperation of VRCA and ITA, work has been done in the field of sound generation modeling and correct spatial representation. So far ViSTA, the VR Software tool developed by the VRCA, models sound sources by playing audio files processed offline or by using a simple parametric sound synthesis; a physical modeling is planned. A module for simulating spatial sound impression is also available. It controls an external DSP hardware for real time binaural synthesis of the generation of spatial audio data. The signal is reproduced via headphones. In the future we plan to upgrade the two screen video system “Holo Bench” located at the Center for Computing and Communication to include the developed dynamic cross-talk cancellation. Beside the three dimensional images a reproduction of three dimensional acoustical scenes is also possible. The actual DSP based binaural synthesis will be implemented purely software based on a PC architecture so that no additional hardware is required anymore. This system is able to add specific spatial information to several audio signals and mixes them into one binaural signal. The system considers the relative positions between the virtual sources and the user in real time. Coupling the binaural synthesis for the generation of the virtual sources and the dynamic cross-talk cancellation for reproduction enables the creation of very realistic three dimensional acoustical scenes with unrestricted mobility of the user. Control unit is the VRCA software ViSTA. It determines the sounds (audio files), start and stop time, all spatial information and several other parameters that influence the sound. Tuning and the enhancement of all concerned sub systems will be the next step in the cooperation of ITA and VRCA.
Since 2003 a special lecture on “Acoustic Virtual Reality” relating to this field has been held at the Institute of Technical Acoustics every summer term. Details at: