articleGetMobile Mobile Computing and CommunicationsJan 20, 2025Closed access

AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration

IIT@MIT

Indexed incrossref

Abstract

Large language models (LLMs) have transformed numerous AI applications. On-device LLM is becoming increasingly important: running LLMs locally on edge devices can reduce cloud computing costs and protect users' privacy. However, the astronomical model size and the limited hardware resources pose significant deployment challenges. To solve these issues, we propose Activation-aware Weight Quantization (AWQ) and TinyChat, an algorithm-system full-stack solution for efficient on-device LLM deployment. AWQ is a novel quantization method that identifies and protects salient weights based on activation distribution, significantly reducing model size while preserving performance. TinyChat, an optimized inference…

Citation impact

164
total citations
FWCI
157.04
Percentile
100%
References
13
Citations per year

Authors

6

Topics & keywords

Keywords
  • Quantization (signal processing)
  • Acceleration
  • Compression (physics)
  • Computer science
  • Materials science
  • Composite material
  • Physics
  • Computer vision
No related works found for this paper.