This research paper focuses on the identification of political actors who make claims in public debates, an important aspect in constructing discourse networks. It compares a traditional pipeline of dedicated NLP components with an LLM for this task. The study finds that LLMs, while good at identifying the right reference, struggle to generate the correct canonical form. This points to a control issue in LLMs’ generated output. A hybrid model combining the LLM with a classifier to normalize its output outperforms both initial models.

 

Publication date: 1 Feb 2024
Project Page: https://arxiv.org/abs/2402.00620v1
Paper: https://arxiv.org/pdf/2402.00620