Hydragen: High-Throughput LLM Inference with Shared Prefixes
This research presents Hydragen, a hardware-aware model that improves the efficiency of transformer-based large language models (LLMs) working with shared prefixes. It is common for LLMs to perform inferences on…
Continue reading