The article introduces the Cambridge Law Corpus (CLC), a corpus developed for legal AI research. The CLC comprises over 250,000 court cases from the UK, with the oldest cases dating back to the 16th century. The corpus contains raw text and meta-data, and includes annotations on case outcomes for 638 cases, done by legal experts. The data has been used to train and evaluate case outcome extraction with GPT-3, GPT-4 and RoBERTa models. Given the sensitive nature of the material, the corpus will only be released for research purposes under certain restrictions. The article also discusses the impact of transformer-based neural networks on textual data analysis, and the emerging field of legal AI.


Publication date: 21 Sep 2023
Project Page: