EdgeShard: Efficient LLM Inference via Collaborative Edge Computing
Hong Kong Polytechnic University
Abstract
Large language models (LLMs) have shown great success in content generation and intelligent intelligent decision making for IoT systems. Traditionally, LLMs are deployed on the cloud, incurring prolonged latency, high bandwidth costs, and privacy concerns. More recently, edge computing has been considered promising in addressing such concerns because the edge devices are closer to data sources. However, edge devices are cursed by their limited resources and can hardly afford LLMs. Existing studies address such a limitation by offloading heavy workloads from edge to cloud or compressing LLMs via model quantization. These methods either still rely heavily on the remote cloud or suffer substantial accuracy loss.…
Citation impact
- FWCI
- 86.29
- Percentile
- 100%
- References
- 37
Authors
5Topics & keywords
- Computer science
- Inference
- Edge computing
- Enhanced Data Rates for GSM Evolution
- Distributed computing
- Theoretical computer science
- Artificial intelligence