Multimodal Deep Learning

Ngiam, Jiquan; Khosla, Aditya; Kim, Mingyu; Nam, Juhan; Lee, Honglak; Ng, Andrew Y.

articleJun 28, 2011Closed access

Multimodal Deep Learning

JNJiquan Ngiam AKAditya Khosla MKMingyu Kim JNJuhan Nam HLHonglak Lee

Stanford University · University of Michigan

Abstract

Deep networks have been successfully applied to unsupervised feature learning for single modalities (e.g., text, images or audio). In this work, we propose a novel application of deep networks to learn features over multiple modalities. We present a series of tasks for multimodal learning and show how to train deep networks that learn features to address these tasks. In particular, we demonstrate cross modality feature learning, where better features for one modality (e.g., video) can be learned if multiple modalities (e.g., audio and video) are present at feature learning time. Furthermore, we show how to learn a shared representation between modalities and evaluate it on a unique task, where the classifier…

Citation impact

2,294

total citations

FWCI: 44.30
Percentile: 100%
References: 26

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Computer science
Modalities
Feature learning
Artificial intelligence
Deep learning
Modality (human–computer interaction)
Feature (linguistics)
Classifier (UML)

UN Sustainable Development Goals

Quality Education

No related works found for this paper.