Summarization-based Data Augmentation for Document Classification

The paper presents SUMMaug, a novel approach to data augmentation for document classification. Despite the use of pretrained language models in natural language understanding tasks, comprehending lengthy text remains a challenge due to the data sparseness problem. To address this, the authors propose a simple yet effective method of summarization-based data augmentation for document classification. The method involves summarizing the input of original training examples and merging the original labels to match the summarized input. This approach allows for curriculum learning, improving the model’s ability to understand lengthy texts. The effectiveness of SUMMaug is confirmed through experiments on two datasets, showing superior performance in terms of accuracy and robustness compared to existing methods.

Publication date: 1 Dec 2023
Project Page: https://github.com/etsurin/summaug
Paper: https://arxiv.org/pdf/2312.00513

Post Views: 312

Press ESC to close

Share Article:

root

SurreyAI 2023 Submission for the Quality Estimation Shared Task

Japanese Tort-case Dataset for Rationale-supported Legal Judgment Prediction

Please allow ads on our site