The paper presents ‘CyberMetric’, a benchmark dataset composed of 10,000 questions from various cybersecurity sources. The dataset’s purpose is to assess and compare the knowledge of large language models (LLMs), including GPT-3.5 and Falcon-180B, in the cybersecurity field. The dataset covers a wide range of topics within cybersecurity, and the findings revealed that LLMs outperformed humans in almost every aspect of cybersecurity. These findings highlight the potential of LLMs in areas such as threat detection, policy interpretation, and security strategy optimization.

 

Publication date: 13 Feb 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2402.07688