articleJun 1, 2020Closed access

What Makes Training Multi-Modal Classification Networks Hard?

Meta (Israel)

Indexed incrossref

Abstract

Consider end-to-end training of a multi-modal vs. a uni-modal network on a task with multiple input modalities: the multi-modal network receives more information, so it should match or outperform its uni-modal counterpart. In our experiments, however, we observe the opposite: the best uni-modal network can outperform the multi-modal network. This observation is consistent across different combinations of modalities and on different tasks and benchmarks for video classifications. This paper identifies two main causes for this performance drop: first, multi-modal networks are often prone to overfitting due to increased capacity. Second, different modalities overfit and generalize at different rates, so training…

Citation impact

476
total citations
FWCI
22.17
Percentile
100%
References
88
Citations per year

Authors

3

Topics & keywords

Keywords
  • Overfitting
  • Computer science
  • Modal
  • Modalities
  • Artificial intelligence
  • Machine learning
  • Task (project management)
  • Training (meteorology)
No related works found for this paper.