Opening the AI black box: program synthesis via mechanistic interpretability
The article introduces MIPS (Mechanistic-Interpretability-based Program Synthesis), an automated method that distills simple learned algorithms from neural networks into Python code. This approach is used to interpret and understand black-box…
Continue reading