This paper proposes a new approach to scientific progress in Natural Language Processing (NLP). The author suggests developing scalable, data-driven theories of linguistic structure. These theories are based on the collection of data in specific ways, which allows for exhaustive annotation of behavioral phenomena of interest. Machine learning is then used to construct explanatory theories of these phenomena, which can be used as building blocks for AI systems. The paper also discusses investigations into data-driven theories of shallow semantic structure using Question-Answer driven Semantic Role Labeling (QA-SRL). The author believes that this approach can inform future scientific progress.

 

Publication date: 1 Dec 2023
Project Page: https://arxiv.org/abs/2312.00349
Paper: https://arxiv.org/pdf/2312.00349