EdgeShard: Efficient LLM Inference via Collaborative Edge Computing

Zhang, Mingjin; Shen, Xiaoming; Cao, Jiannong; Cui, Zeyang; Jiang, Shan

doi:10.1109/jiot.2024.3524255

articleIEEE Internet of Things JournalDec 31, 2024Closed access

EdgeShard: Efficient LLM Inference via Collaborative Edge Computing

MZMingjin Zhang XSXiaoming Shen JCJiannong Cao ZCZeyang Cui SJShan Jiang

Hong Kong Polytechnic University

Indexed incrossref

Abstract

Large language models (LLMs) have shown great success in content generation and intelligent intelligent decision making for IoT systems. Traditionally, LLMs are deployed on the cloud, incurring prolonged latency, high bandwidth costs, and privacy concerns. More recently, edge computing has been considered promising in addressing such concerns because the edge devices are closer to data sources. However, edge devices are cursed by their limited resources and can hardly afford LLMs. Existing studies address such a limitation by offloading heavy workloads from edge to cloud or compressing LLMs via model quantization. These methods either still rely heavily on the remote cloud or suffer substantial accuracy loss.…

Citation impact

122

total citations

FWCI: 86.29
Percentile: 100%
References: 37

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Inference
Edge computing
Enhanced Data Rates for GSM Evolution
Distributed computing
Theoretical computer science
Artificial intelligence

No related works found for this paper.

Funding

HK
Hong Kong Institute for Monetary Research
Award: C5032-23G