articleIEEE Internet of Things JournalDec 31, 2024Closed access

EdgeShard: Efficient LLM Inference via Collaborative Edge Computing

Hong Kong Polytechnic University

Indexed incrossref

Abstract

Large language models (LLMs) have shown great success in content generation and intelligent intelligent decision making for IoT systems. Traditionally, LLMs are deployed on the cloud, incurring prolonged latency, high bandwidth costs, and privacy concerns. More recently, edge computing has been considered promising in addressing such concerns because the edge devices are closer to data sources. However, edge devices are cursed by their limited resources and can hardly afford LLMs. Existing studies address such a limitation by offloading heavy workloads from edge to cloud or compressing LLMs via model quantization. These methods either still rely heavily on the remote cloud or suffer substantial accuracy loss.…

Citation impact

122
total citations
FWCI
86.29
Percentile
100%
References
37
Citations per year

Authors

5

Topics & keywords

Keywords
  • Computer science
  • Inference
  • Edge computing
  • Enhanced Data Rates for GSM Evolution
  • Distributed computing
  • Theoretical computer science
  • Artificial intelligence
No related works found for this paper.

Funding