MoE-LLaVA : Mixture of Experts for Large Vision-Language Models
Indexed incrossref
Abstract
Recently, remarkable progress has been made in scaling up Large Language Models (LLMs) through the use of the sparse Mixture-of-Expert (MoE) layers without significantly increasing computational cost. However, the transition from a pre-trained LLM to a sparse Large Vision-Language Model (LVLM) with MoE remains an open challenge. Directly fine-tuning an LLM to a sparse LVLM often leads to training collapse, characterized by (1) a large modality feature distribution gap and (2) expert load imbalance. This paper proposes a three-stage decoupled weight training process. In the first two stages, the model learns to adapt the LLM to an LVLM. In the third stage, the FFN weights from the second stage are used as…
Citation impact
8
total citations
- FWCI
- 155.53
- Percentile
- 100%
- References
- 0
Citations per year
Authors
10Topics & keywords
Topics
Keywords
- Initialization
- Sparse matrix
- Sparse approximation
- Feature (linguistics)
- Code (set theory)
- Baseline (sea)
- Lossless compression
No related works found for this paper.