Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes
The authors have developed an enhanced audio-visual Sound Event Localization and Detection (SELD) network, improving on the…
The authors have developed an enhanced audio-visual Sound Event Localization and Detection (SELD) network, improving on the…
The article discusses Singing Voice Conversion (SVC), a technology that allows the conversion of one singer’s voice…
The paper introduces ESPnet-SPK, a toolkit for training speaker embedding extractors. It offers an open-source platform for…
The article introduces AudioSeal, a novel technique designed specifically for localized detection of AI-generated speech, in response…
This paper introduces Investigate-Consolidate-Exploit (ICE), a new approach for improving the adaptability and flexibility of AI agents…
The article presents ConstraintChecker, a plugin designed to enhance the reasoning capabilities of Large Language Models (LLMs)…
The study introduces CMMU, a benchmark tool for evaluating the understanding and reasoning abilities of multi-modal large…
The article presents the Uncertainty-Aware Language Agent (UALA), a new framework that leverages uncertainty quantification to improve…
Unitxt is an innovative library designed for customizable textual data preparation and evaluation tailored to generative language…
The paper focuses on the use of Transformer-based language models, specifically BERT and (Chat)GPT, in detecting semantic…