This paper introduces PrivateLoRA, a novel Large Language Model (LLM) service paradigm that balances privacy and efficiency. Traditional LLM services force users to choose between privacy and performance. PrivateLoRA addresses this issue by dividing computation between edge devices and the cloud, transmitting only necessary data to maintain user privacy. PrivateLoRA also significantly reduces communication overhead and is highly resource-efficient, improving throughput by over 300% for 7B models and over 80% for 33B models on standard 5G networks. This innovative framework, the first of its kind, democratizes access to advanced AI for edge devices, enabling more personalized LLM experiences while preserving data locality and user privacy.

 

Publication date: 23 Nov 2023
Project Page: https://arxiv.org/abs/2311.14030
Paper: https://arxiv.org/pdf/2311.14030