This project was funded by EPSRC (EP/L019906/1). Conventional surround sound systems such as 5.1 or 7.1 are limited in that they are only able to produce a two-dimensional (2D) impression of auditory width and depth. Next generation surround sound systems that have been introduced over recent years tend to employ height channel loudspeakers in order to provide the listener with the impression of a three-dimensional (3D) soundfield. Although new methods to position (pan) the sound image in the vertical plane have been investigated, there is currently a lack of research into methods to render the perceived vertical width of the image. The vertical width rendering is particularly important for creating the impression of a fully immersive 3D ambient sound in such applications as the production of original 3D music/broadcasting content and the 3D upmixing of 2D content. This project aims to provide fundamental understandings of the perception and control of vertically oriented image width for 3D multichannel audio. Three objectives have been formulated to achieve this aim: (i) to determine the frequency-dependent perceptual resolution of interchannel decorrelation for vertical image widening; (ii) to determine the effectiveness of ‘Perceptual Band Allocation (PBA)’, a novel method proposed for vertical image widening; (iii) to evaluate the above two methods in real-world 2D to 3D upmixing scenarios. These objectives will be achieved through relevant signal processing techniques and subjective listening tests focussing on perceived spatial and tonal qualities. Data obtained from the listening tests will be analysed using robust statistical methods in order to model the relationship between perceptual patterns and relevant parameters. The results of this project will provide researchers and engineers with academic references for the development of new 3D audio rendering algorithms, and will ultimately enable the general public to experience a fully immersive surround sound in the home-cinema, car and mobile environments.
The key findings from this project are as follows.
- The perceptual mechanism of the so-called Pitch-Height effect for virtual auditory images has been revealed. Formal experimental data on the perceived vertical positions of octave-band filtered virtual images have been provided for different azimuth angles. It has been found that the nature of virtual source elevation localisation is significantly different from that of real source elevation localisation.
- It has been shown that the aforementioned vertical image position data can be successfully exploited for rendering different degrees of vertical image spread. This method has been tested for the 2D to 3D sound upmixing of ambient sound. The results showed that the method was subjectively preferred to other conventional methods.
- The association between the loudspeaker base angle and the perceived image elevation has been investigated in depth. It was generally shown that the perceived image is elevated from the front to above of the listener as the loudspeaker base angle increases from 0 degree to 180 degrees. It was newly found that the effect significantly depends on the spectral and temporal characteristics of the sound source. Sources with a broad and Specifically, frequency bands centred around 500Hz and 8kHz were found to have the strongest elevation effect. These findings have important implications for practical applications such as 3D sound rendering, upmixing and downmixing.
- A novel theory that ultimately explains the reason for the virtual image elevation effect has been established. Whilst the conventional theory based on the psychophysics of pinnae spectral distortion is limited to explaining the effect for high frequencies, the proposed theory is based on the brain’s cognitive interpretation of ear-input signals is able to explain the effect for low frequencies also.
EPSRC-funded project: Sep 2014 – Aug 2016 (EP/L019906/1)  
Researchers: Dr Hyunkook Lee, Dr Christopher Gribben, Dr Rory Wallis
Supervisor: Dr Hyunkook Lee
Publications:
- Lee, H. (2017). Sound Source and Loudspeaker Base Angle Dependency of Phantom Image Elevation Effect. AES: Journal of the Audio Engineering Society, 65(9), 733-748. https://doi.org/10.17743/jaes.2017.0028
- Lee, H., Johnson, D., & Mironovs, M. (2018). Virtual Hemispherical Amplitude Panning (VHAP): A Method for 3D Panning without Elevated Loudspeakers. In Proceedings of 144th AES International Convention [Convention Paper 9965]
- Lee, H. (2016). Perceptual band allocation (PBA) for the rendering of vertical image spread with a vertical 2D loudspeaker array. AES: Journal of the Audio Engineering Society, 64(12), 1003-1013. https://doi.org/10.17743/jaes.2016.0052
- Lee, H. (2016). Perceptually motivated 3D diffuse field upmixing. In Proceedings of the 2016 AES International Conference on Sound Field Control (Vol. 2016-July). Audio Engineering Society.
- Lee, H. (2015). 2D-to-3D ambience upmixing based on perceptual band allocation. AES: Journal of the Audio Engineering Society, 63(10), 811-821. https://doi.org/10.17743/jaes.2015.0075
- Gribben, C., & Lee, H. (2018). The Frequency and Loudspeaker-Azimuth Dependencies of Vertical Interchannel Decorrelation on the Vertical Spread of an Auditory Image. AES: Journal of the Audio Engineering Society, 66(7-8), 537-555. https://doi.org/10.17743/jaes.2018.0040
- Gribben, C., & Lee, H. (2017). A Comparison between Horizontal and Vertical Interchannel Decorrelation. Applied Sciences, 7(11), [1202]. https://doi.org/10.3390/app7111202
- Wallis, R., & Lee, H. (2017). The Reduction of Vertical Interchannel Crosstalk: The Analysis of Localisation Thresholds for Natural Sound Sources. Applied Sciences, 7(3), [278]. https://doi.org/10.3390/app7030278
- Wallis, R., & Lee, H. (2016). Vertical stereophonic localization in the presence of interchannel crosstalk: The analysis of frequency-dependent localization thresholds. AES: Journal of the Audio Engineering Society, 64(10), 762-770. https://doi.org/10.17743/jaes.2016.0039
- Wallis, R., & Lee, H. (2015). The effect of interchannel time difference on localization in vertical stereophony. AES: Journal of the Audio Engineering Society, 63(10), 767-776. https://doi.org/10.17743/jaes.2015.0069
- Lee, H., & Gribben, C. (2014). Effect of vertical microphone layer spacing for a 3D microphone array. AES: Journal of the Audio Engineering Society, 62(12), 870-884. https://doi.org/10.17743/jaes.2014.0045
