The Genie method, proposed by researchers at IBM Israel Research Lab, Hebrew University of Jerusalem, and MIT, aims to generate high-quality, synthetic, content-grounded data. The method involves three stages: Content Preparation, Generation, and Filtering. The researchers used Genie to create synthetic data for long-form question-answering, summarization, and information extraction tasks. In human evaluations, the generated data was deemed natural and of high quality. The models trained on this data were found to be on par with or outperforming models trained on human-generated data, especially in terms of faithfulness.

 

Publication date: 25 Jan 2024
Project Page: https://arxiv.org/abs/2401.14367
Paper: https://arxiv.org/pdf/2401.14367