Masked Audio Modeling with CLAP and Multi-Objective Learning
This paper discusses the limitations of current masked audio modeling (MAM) methods and presents a new method to enhance the semantic modeling of MAM. The proposed method distills cross-modality knowledge…
Continue reading