The article introduces MIPS (Mechanistic-Interpretability-based Program Synthesis), an automated method that distills simple learned algorithms from neural networks into Python code. This approach is used to interpret and understand black-box neural networks. Unlike large language models, this technique does not rely on human training data such as algorithms and code from GitHub, making it potentially more scalable and versatile. The authors found the MIPS method highly complementary to GPT-4, solving 32 out of 62 benchmark algorithmic tasks, including 13 not solved by GPT-4.

 

Publication date: 8 Feb 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2402.05110