articleDec 3, 2012Closed access

Multimodal Learning with Deep Boltzmann Machines

University of Toronto

Abstract

A Deep Boltzmann Machine is described for learning a generative model of data that consists of multiple and diverse input modalities. The model can be used to extract a unified representation that fuses modalities together. We find that this representation is useful for classification and information retrieval tasks. The model works by learning a probability density over the space of multimodal inputs. It uses states of latent variables as representations of the input. The model can extract this representation even when some modalities are absent by sampling from the conditional distribution over them and filling them in. Our experimental results on bi-modal data consisting of images and text show that the…

Citation impact

626
total citations
FWCI
24.97
Percentile
100%
References
37
Citations per year

Authors

2

Topics & keywords

Keywords
  • Computer science
  • Boltzmann machine
  • Artificial intelligence
  • Modalities
  • Modality (human–computer interaction)
  • Generative model
  • Representation (politics)
  • Kernel (algebra)
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.