This paper discusses the development of two large-scale datasets, VeriGen and OpenABC-D, to assist in Machine Learning for Electronic Design Automation (EDA). VeriGen is a dataset of Verilog code gathered from GitHub and Verilog textbooks, while OpenABC-D is a labeled dataset designed to assist ML for logic synthesis tasks, consisting of 870,000 And-Inverter-Graphs (AIGs) produced from 1500 synthesis runs. The datasets aim to fill the gap of standard datasets in the EDA domain, and the authors discuss the challenges in curating, maintaining, and expanding these datasets, as well as questions of dataset quality and security.

 

Publication date: 17 Oct 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2310.10560