This project was funded by EPSRC (EP/L019906/1). Conventional surround sound systems such as 5.1 or 7.1 are limited in that they are only able to produce a two-dimensional (2D) impression of auditory width and depth. Next generation surround sound systems that have been introduced over recent years tend to employ height channel loudspeakers in order to provide the listener with the impression of a three-dimensional (3D) soundfield. Although new methods to position (pan) the sound image in the vertical plane have been investigated, there is currently a lack of research into methods to render the perceived vertical width of the image. The vertical width rendering is particularly important for creating the impression of a fully immersive 3D ambient sound in such applications as the production of original 3D music/broadcasting content and the 3D upmixing of 2D content. This project aims to provide fundamental understandings of the perception and control of vertically oriented image width for 3D multichannel audio. Three objectives have been formulated to achieve this aim: (i) to determine the frequency-dependent perceptual resolution of interchannel decorrelation for vertical image widening; (ii) to determine the effectiveness of ‘Perceptual Band Allocation (PBA)’, a novel method proposed for vertical image widening; (iii) to evaluate the above two methods in real-world 2D to 3D upmixing scenarios. These objectives will be achieved through relevant signal processing techniques and subjective listening tests focussing on perceived spatial and tonal qualities. Data obtained from the listening tests will be analysed using robust statistical methods in order to model the relationship between perceptual patterns and relevant parameters. The results of this project will provide researchers and engineers with academic references for the development of new 3D audio rendering algorithms, and will ultimately enable the general public to experience a fully immersive surround sound in the home-cinema, car and mobile environments.

The key findings from this project are as follows.

  1. The perceptual mechanism of the so-called Pitch-Height effect for virtual auditory images has been revealed. Formal experimental data on the perceived vertical positions of octave-band filtered virtual images have been provided for different azimuth angles. It has been found that the nature of virtual source elevation localisation is significantly different from that of real source elevation localisation.
  2. It has been shown that the aforementioned vertical image position data can be successfully exploited for rendering different degrees of vertical image spread. This method has been tested for the 2D to 3D sound upmixing of ambient sound. The results showed that the method was subjectively preferred to other conventional methods.
  3. The association between the loudspeaker base angle and the perceived image elevation has been investigated in depth. It was generally shown that the perceived image is elevated from the front to above of the listener as the loudspeaker base angle increases from 0 degree to 180 degrees. It was newly found that the effect significantly depends on the spectral and temporal characteristics of the sound source. Sources with a broad and Specifically, frequency bands centred around 500Hz and 8kHz were found to have the strongest elevation effect. These findings have important implications for practical applications such as 3D sound rendering, upmixing and downmixing.
  4. A novel theory that ultimately explains the reason for the virtual image elevation effect has been established. Whilst the conventional theory based on the psychophysics of pinnae spectral distortion is limited to explaining the effect for high frequencies, the proposed theory is based on the brain’s cognitive interpretation of ear-input signals is able to explain the effect for low frequencies also.

EPSRC-funded project: Sep 2014 – Aug 2016 (EP/L019906/1) 

Researchers: Dr Hyunkook Lee, Dr Christopher Gribben, Dr Rory Wallis

Supervisor: Dr Hyunkook Lee


Next Post
The perceptual contribution of pinna related transfer function attributes in the median plane
Previous Post
Perceptual Optimisation of Virtual Room Acoustics