The article proposes the use of Transformer-based language models to classify entity legal forms from raw legal entity names. The study evaluated the performance of various BERT variants against traditional baselines on a dataset of over 1.1 million legal entities. The findings show that pre-trained BERT variants outperform traditional text classification approaches. Transformer-based models have significant potential in advancing data standardization and integration, aiding financial institutions, corporations, and governments in assessing business relationships, understanding risk exposure, and promoting effective governance.

 

Publication date: 20 Oct 2023
Project Page: Not Provided
Paper: https://arxiv.org/pdf/2310.12766