Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals
This research article discusses the issue of over-smoothing in deep transformer models, where token representations become identical as the model’s depth grows. The authors propose a novel regularizer that penalizes…
Continue reading