The article discusses the challenges of machine translation for Arabic dialects on social media. Arabic speakers use dialects rather than Modern Standard Arabic (MSA) in their conversations, which transfer to their digital social media use. Existing translation systems for MSA fail with Arabic dialects. The researchers propose an online social network-based multidialect Arabic dataset (OSN-MDAD) that translates English tweets into four Arabic dialects. They validated the dataset by developing neural machine translation models for each dialect, achieving superior results with Transformer-based models.

 

Publication date: 22 Sep 2023
Project Page: ?
Paper: https://arxiv.org/pdf/2309.12137