The paper discusses the challenges in text data augmentation due to the discrete nature of sentences. It highlights the limitations of rule-based augmentation methods and softEDA (easy data augmentation with soft labels) in maintaining semantic consistency and finding the best factor for each model and dataset. To overcome these issues, the authors propose adapting AutoAugment, a technique to determine the optimal factors in the data augmentation process. The results suggest that AutoAugment can boost existing augmentation methods and enhance cutting-edge pre-trained language models. The source code is provided for further research and implementation.

 

Publication date: 9 Feb 2024
Project Page: https://github.com/c-juhwan/soft-text-autoaugment
Paper: https://arxiv.org/pdf/2402.05584