The paper discusses the methodology used for the ‘Nuanced Arabic Dialect Identification (NADI) Shared Task 2023’. It focuses on identifying country-level dialects using various transformer-based models pre-trained on the Arabic language. The authors fine-tuned these models on a provided dataset and used an ensembling method to improve system performance. Dialect identification has a significant impact on enhancing NLP tasks like speech recognition and translation. The study achieved a 76.65 F1-score, ranking 11th on the leaderboard.
Publication date: 1 Dec 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2311.18739