The paper examines the capacity of AI, specifically GPT-4, to independently generate and verify hypotheses for a machine learning research problem. The authors minimized instructions on hypothesis generation, verification, and problem-specific preparations to assess if large language models can autonomously generate and verify hypotheses. The results show that GPT-4 can, in some cases, autonomously generate and validate hypotheses without detailed guidance. However, none of the verifications were flawless, indicating that there are significant challenges in achieving autonomous, human-level research using only generic instructions.
Publication date: 16 Nov 2023
Project Page: https://github.com/t46/mock-pipeline
Paper: https://arxiv.org/pdf/2311.09706