The article discusses the efficiency of parameter-shared pre-trained language models (PLMs) in resource-constrained environments. Despite the reduction in model storage and memory costs, parameter sharing does not alleviate computational burdens associated with inference. The authors introduce a technique based on neural ordinary differential equations (ODEs) to enhance the inference efficiency of these models. They also propose a pre-training technique for shared models for greater inference acceleration. The study concludes with experimental results demonstrating the effectiveness of these methods on autoregressive and autoencoding PLMs.
Publication date: 20 Oct 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2310.12818