Scale Alone Does not Improve Mechanistic Interpretability in Vision Models

This study investigates whether the recent trend of scaling up neural networks in terms of dataset and model size has improved our understanding of their internal workings, specifically in the field of mechanistic interpretability. The researchers conducted a psychophysical experiment on a diverse suite of models and found no correlation between scale and interpretability. In fact, they found that modern, larger models are less interpretable than older ones, suggesting a regression rather than improvement. The paper highlights the need for models explicitly designed for interpretability and more effective interpretability methods. The researchers also released a dataset containing over 120,000 human responses from their experiment to facilitate further research in this area.

Publication date: July 11, 2023
Project Page: brendel-group.github.io/imi
Paper: https://arxiv.org/pdf/2307.05471.pdf

Post Views: 381

Scale Alone Does not Improve Mechanistic Interpretability in Vision Models

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives

My3DGen: Building Lightweight Personalized 3D Generative Model

Leave a Reply Cancel reply

Please allow ads on our site