This article discusses a comparative study investigating the performances of three models: a proposed convolutional neural network (CNN), the VGG16 with fully connected layers (FC), and an eXtreme Gradient Boosting (XGBoost) approach on different features: 30-second Mel spectrogram and 3-second Mel-frequency cepstral coefficients (MFCCs). The study finds that the MFCC XGBoost model outperforms the others. Additionally, the study shows that applying data segmentation in the data preprocessing phase can significantly enhance the performance of the CNNs.

 

Publication date: 11 Jan 2024
Project Page: Not Provided
Paper: https://arxiv.org/pdf/2401.04737