Simplifying Transformer Blocks
The article by Bobby He & Thomas Hofmann focuses on simplifying transformer blocks in deep learning. They question whether components such as skip connections, projection/value matrices, sequential sub-blocks, and normalization…
Continue reading