How Transformers Learn Causal Structure with Gradient Descent
The study by Eshaan Nichani, Alex Damian, and Jason D. Lee from Princeton University investigates how transformers learn causal structures using gradient descent. The research reveals that transformers’ success in…
Continue reading