This project investigates into the recording and reproduction methods for 360° audio for virtual reality applications. Currently, the most popular method for capturing 360° audio for VR is arguably the first order Ambisonics (FOA). FOA microphone systems are typically compact in size, thus convenient for location recording, and offers a stable localization characteristic and a flexible sound field rotation functionality. However, FOA has limitations in terms of perceived spaciousness and the size of sweet spot in loudspeaker reproduction due to the high level of interchannel correlation. On the other hand, a near-coincident microphone array, which incorporates directional microphones that are spaced and angled outwards, can provide a greater balance between spaciousness and localizability than a pure coincident array. The current project investigates into the localisation accuracy and spatial attributes of Equal Segment Microphone Array (ESMA) for music and urban soundscape VR applications in both visual and non-visual conditions. Also, different recording and reproduction techniques are perceptually evaluated in terms of their low-level spatial attributes. The optimal use scenarios for different techniques are determined depending on sound source, acoustic conditions and environmental context. Below is the summary of some of the key findings so far.

  • The correct spacing between microphones for a quadraphonic ESMA using cardioid microphones is 50cm. This has been proposed based on a novel interchannel level and time trade-off model called MARRS (Lee et al. 2017), and verified through subjective listening tests (Lee 2019). According to the model, the spacing is calculated to be smaller with microphones with higher directionality (e.g. 37cm for supercardioid and 25cm for hypercardioid).
  • ESMA produces better results than FOA in terms of environmental width, listener envelopment and overall spatial quality, whereas FOA tends to provides slightly greater environmental depth and source distance (Millns and Lee 2018). However, this difference depends on the positional arrangement of sound sources rather than the type of sound source.
  • ESMA-3D with a vertical extension of ESMA outperforms FOA in sport event recording and reproduction in both 9.1 and 5.1 formats in terms of presence, robustness, envelopment and overall quality of experience (Moulson and Lee 2019). This is regardless of the presence of visual scene.
  • Timbral and spatial degradation of ESMA-3D recordings in Ambisonic binaural rendering is in “Excellent” to “Good” categories for complex musical ensemble recordings when the “magnitude least square” decoding method (IEM binaural renderer) is used (Lee et al. 2019). The conventional cube virtual loudspeaker decoding produces quality in “Good” to “Fair” range.

Researchers: Dr Hyunkook Lee, Connor Millns

Supervisor: Dr Hyunkook Lee

Project summary:


Next Post
Investigations into the Perception of Vertical Interchannel Decorrelation in 3D Surround Sound Reproduction
Previous Post
Trade-off between interaural time and level differences for the prediction of auditory image position