Capturing and rendering of audio for virtual reality

This project investigates into the recording and reproduction methods for 360° audio for virtual reality applications. Currently, the most popular method for capturing 360° audio for VR is arguably the first order Ambisonics (FOA). FOA microphone systems are typically compact in size, thus convenient for location recording, and offers a stable localization characteristic and a flexible sound field rotation functionality. However, FOA has limitations in terms of perceived spaciousness and the size of sweet spot in loudspeaker reproduction due to the high level of interchannel correlation. On the other hand, a near-coincident microphone array, which incorporates directional microphones that are spaced and angled outwards, can provide a greater balance between spaciousness and localizability than a pure coincident array. The current project investigates into the localisation accuracy and spatial attributes of Equal Segment Microphone Array (ESMA) for music and urban soundscape VR applications in both visual and non-visual conditions. Also, different recording and reproduction techniques are perceptually evaluated in terms of their low-level spatial attributes. The optimal use scenarios for different techniques are determined depending on sound source, acoustic conditions and environmental context. Below is the summary of some of the key findings so far.

The correct spacing between microphones for a quadraphonic ESMA using cardioid microphones is 50cm. This has been proposed based on a novel interchannel level and time trade-off model called MARRS (Lee et al. 2017), and verified through subjective listening tests (Lee 2019). According to the model, the spacing is calculated to be smaller with microphones with higher directionality (e.g. 37cm for supercardioid and 25cm for hypercardioid).
ESMA produces better results than FOA in terms of environmental width, listener envelopment and overall spatial quality, whereas FOA tends to provides slightly greater environmental depth and source distance (Millns and Lee 2018). However, this difference depends on the positional arrangement of sound sources rather than the type of sound source.
ESMA-3D with a vertical extension of ESMA outperforms FOA in sport event recording and reproduction in both 9.1 and 5.1 formats in terms of presence, robustness, envelopment and overall quality of experience (Moulson and Lee 2019). This is regardless of the presence of visual scene.
Timbral and spatial degradation of ESMA-3D recordings in Ambisonic binaural rendering is in “Excellent” to “Good” categories for complex musical ensemble recordings when the “magnitude least square” decoding method (IEM binaural renderer) is used (Lee et al. 2019). The conventional cube virtual loudspeaker decoding produces quality in “Good” to “Fair” range.

Researchers: Dr Hyunkook Lee, Connor Millns

Supervisor: Dr Hyunkook Lee

Project summary:

Publications:

Lee, H., Matthias, F., and Zotter, F. (2019). Spatial and Timbral Fidelities of Binaural Ambisonics Decoders for Main Microphone Array Recordings, AES International Conference on Immersive and Interactive Audio, York.
Lee, H. (2019). Capturing 360° Audio using an Equal Segment Microphone Array (ESMA). AES: Journal of the Audio Engineering Society, 67(1/2), 13-26. https://doi.org/10.17743/jaes.2018.0068
Millns, C., & Lee, H. (2018). An Investigation into Spatial Attributes of 360° Microphone Techniques for Virtual Reality. In Proceedings of 144th AES International Convention [Convention Paper 10005] Audio Engineering Society.
Lee, H., Johnson, D., & Mironovs, M. (2017). An Interactive and Intelligent Tool for Microphone Array Design. In Audio Engineering Society Convention 143 Audio Engineering Society.

Subscribe To Our Mailing List