Dynamic Layer Tying for Parameter-Efficient Transformers
The study presents a method of dynamically selecting layers in deep transformer networks to reduce the number of trainable parameters. This is achieved by employing Reinforcement Learning to decide whether…
Continue reading