This paper explores the area of sentiment analysis on noisy Bengali texts, a language with limited resources in this realm. A dataset (NC-SentNoB) was manually annotated to identify ten different types of noise in about 15,000 Bengali texts. The noise types were identified and addressed as a multi-label classification task. The researchers then introduced baseline noise reduction methods before conducting sentiment analysis. The performance of fine-tuned sentiment analysis models with both noisy and noise-reduced texts was assessed. Experimental findings indicated that the noise reduction methods used were not satisfactory, pointing to the need for more suitable noise reduction methods in future research.
Publication date: 26 Jan 2024
Project Page: https://github.com/ktoufiquee/A-Comparative-Analysis-of-Noise-Reduction-Methods-in-Sentiment-Analysis-on-Noisy-Bengali-Texts
Paper: https://arxiv.org/pdf/2401.14360