articleJun 1, 2020Closed access
What Makes Training Multi-Modal Classification Networks Hard?
Indexed incrossref
Abstract
Consider end-to-end training of a multi-modal vs. a uni-modal network on a task with multiple input modalities: the multi-modal network receives more information, so it should match or outperform its uni-modal counterpart. In our experiments, however, we observe the opposite: the best uni-modal network can outperform the multi-modal network. This observation is consistent across different combinations of modalities and on different tasks and benchmarks for video classifications. This paper identifies two main causes for this performance drop: first, multi-modal networks are often prone to overfitting due to increased capacity. Second, different modalities overfit and generalize at different rates, so training…
Citation impact
476
total citations
- FWCI
- 22.17
- Percentile
- 100%
- References
- 88
Citations per year
Authors
3Topics & keywords
Topics
Keywords
- Overfitting
- Computer science
- Modal
- Modalities
- Artificial intelligence
- Machine learning
- Task (project management)
- Training (meteorology)
No related works found for this paper.