Energy Optimization Approach for Smart Buildings Considering Supply and Demand Uncertainties

Liu, X.; Duan, Z.; Wang, J.

Journal of Engineering, Project, and Production Management, 2026, 16(3), 2026-341

Deep Learning-Based Multichannel Audio Spatial Reconstruction and Binaural Rendering for Immersive Recording Applications

Jian Wang

Instructor, School of Art and Design, Yantai Institute of Science and Technology, Yantai, 265600, China, E-mail:wangjianwj11@outlook.com

Project Management

Received March 18, 2026; revised April 23, 2026; accepted May 9, 2026

Available online May 29, 2026

Abstract: Immersive audio technology has become a core element in Virtual Reality (VR), Augmented Reality (AR), and metaverse applications, directly affecting the sense of presence and interaction quality experienced by users. This study presents a deep learning-driven framework for multi-channel audio spatial reconstruction and binaural rendering that converts signals captured by spherical microphone arrays into personalized binaural audio. The system employs an integrated neural network architecture that combines a U-Net encoder-decoder architecture, a multi-head self-attention mechanism, and cross-channel feature fusion. Testing under a 32-channel configuration demonstrates that the signal-to-noise ratio reaches 20.3 dB, the log spectral distortion 1.8 dB, and the spatial localization error remains within 6.2 degrees. For personalized processing, the system uses a Variational Autoencoder (VAE) to generate customized Head-Related Transfer Functions (HRTFs) within minutes from only facial photographs, reducing localization error by 50% and decreasing in-head localization from 33% to 8%. Through TensorRT optimization, the system achieves 15.2ms end-to-end latency on NVIDIA RTX 3090 GPU, meeting the latency requirements of virtual reality and augmented reality environments. Subjective evaluation shows the system attains an overall score of 8.7 out of 10, approaching the quality benchmark of authentic binaural recordings. Field testing across scenarios, including virtual reality gaming, remote medical consultation, symphony recording, and film post-production, validates the system's practical feasibility, providing a viable path to advancing immersive media technology.

Keywords: Deep learning, spatial audio reconstruction, binaural rendering, head-related transfer function (HRTF).

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Requests for reprints and permissions at eppm.journal@gmail.com.

Citation: Wang, J. (2026). Deep Learning-Based Multichannel Audio Spatial Reconstruction and Binaural Rendering for Immersive Recording Applications. Journal of Engineering, Project, and Production Management, 16(3), 2026-341.

DOI: 10.32738/JEPPM-2026-341

Full Text