This research explores the benefits of combining 3D scene information with blind audio recordings for novel-view acoustic synthesis. The main challenges identified are sound source localization, separation, and dereverberation. The study demonstrates that incorporating room impulse responses derived from 3D reconstructed rooms enables the network to handle these tasks jointly. The method outperforms existing techniques and effectively utilizes 3D visual information. The model achieves near-perfect accuracy on source localization, and high performance for source separation and dereverberation.

 

Publication date: 25 Oct 2023
Project Page: Not Provided
Paper: https://arxiv.org/pdf/2310.15130