root, Author at BytesArchive

January 11, 2024

Class-Incremental Learning for Multi-Label Audio Classification

The article presents a new method for class-incremental learning of potentially overlapping sounds for multi-label audio classification…

Machine Learning Sound

January 11, 2024

Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement

The article discusses the challenge of audio-to-audio (A2A) style transfer, especially in the context of transferring emotional…

Artificial Intelligence Computation and Language

January 11, 2024

Cross-Speaker Encoding Network for Multi-Talker Speech Recognition

The paper introduces a Cross-Speaker Encoding (CSE) network to improve multi-talker speech recognition. Current methods, single-input multiple-output…

Sound

January 11, 2024

RaD-Net: A Repairing and Denoising Network for Speech Signal Improvement

The paper discusses RaD-Net, a repairing and denoising network for speech signal improvement. The authors have improved…

Machine Learning Sound

January 11, 2024

HyperGANStrument: Instrument Sound Synthesis and Editing with Pitch-Invariant Hypernetworks

The paper proposes HyperGANStrument, a novel neural synthesizer that enhances the generation capability of GANStrument by introducing…

Artificial Intelligence Machine Learning

January 11, 2024

Masked Audio Generation using a Single Non-Autoregressive Transformer

The paper introduces MAGNET, a masked generative sequence modeling method that operates directly over several streams of…

Computation and Language Human-Computer Interaction

January 11, 2024

Real-time and Continuous Turn-taking Prediction Using Voice Activity Projection

This article introduces a system for real-time and continuous turn-taking prediction in spoken dialogue systems (SDSs). The…

Computation and Language Sound

January 11, 2024

Learning Audio Concepts from Counterfactual Natural Language

The article discusses the limitations of conventional audio classification methods and introduces a novel method that incorporates…

Sound

January 11, 2024

Self-supervised speech representation and contextual text embedding for match-mismatch classification with EEG recording

The research focuses on the match-mismatch classification with EEG recording using self-supervised speech representation and contextual text…

Sound

January 11, 2024

Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection

The article discusses the development of a full-frequency dynamic convolution (FFDConv) for sound event detection. Traditional 2D…

Press ESC to close

root

Please allow ads on our site