Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding
The article presents a novel method called ‘Hydra heads’ to improve the efficiency of speculative decoding in transformer-based large language models (LLMs). The study builds upon the Medusa decoding framework,…
Continue reading