CAGR: A Cross-Accelerator Graph Optimization Framework for Efficient Recommender System Inference
Carnegie Mellon University · Microsoft Research Asia (China) · +3 more institutions
Abstract
Recommender systems have become ubiquitous in modern online services, yet their deployment across diverse hardware accelerators remains challenging due to significant performance variations. Contemporary deep learning recommendation models (DLRMs), such as DeepFM and NGCF, exhibit substantial inference latency differences when executed on NVIDIA GPUs,AMDGPUs, and Google TPUs, primarily due to architectural disparities and vendor-specific optimization strategies. Existing graph optimization frameworks are typically designed for specific hardware backends, lacking the flexibility to generate portable high-performance implementations across heterogeneous accelerators. This paper presents CAGR (Cross-Accelerator…
Citation impact
- FWCI
- 391.28
- Percentile
- 100%
- References
- 0
Authors
5Topics & keywords
- Implementation
- Recommender system
- Inference
- Pipeline (software)
- Bayesian optimization
- Graph
- Software deployment
- Optimization problem