Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy

Sparsely activated Mixture-of-Experts (SMoE) has shown promise in scaling up the learning capacity of neural networks. However, problems like high memory usage and redundancy in experts arise due to duplication of network layers into multiple copies as experts. This paper introduces a novel merging algorithm for SMoE, called M-SMoE, which uses routing statistics to guide expert merging. The process starts with neuron permutation alignment for experts, then forms dominant experts and their group members based on routing policies. Finally, every expert group is merged into a single expert using each expert’s activation frequency as their weight for merging. This method reduces memory usage by up to 80% and FLOPs by 20% with virtually no loss in performance.

Publication date: 2 Oct 2023
Project Page: https://github.com/UNITES-Lab/MC-SMoE
Paper: https://arxiv.org/pdf/2310.01334

Post Views: 307

Press ESC to close

Share Article:

root

GenSim: Generating Robotic Simulation Tasks via Large Language Models

TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series

Please allow ads on our site