reviewACM Computing SurveysApr 9, 2024HYBRID OA

Foundations & Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Carnegie Mellon University

Indexed incrossref

Abstract

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, embodied autonomous agents, text-to-image generation, and multisensor fusion in application domains such as healthcare and robotics, multimodal machine learning has brought unique computational and theoretical challenges to the machine learning community given the heterogeneity of data sources and the interconnections often found between…

Citation impact

148
total citations
FWCI
32.98
Percentile
100%
References
300
Citations per year

Authors

3

Topics & keywords

Keywords
  • Computer science
  • Artificial intelligence
  • Machine learning
  • Data science
  • Human–computer interaction
No related works found for this paper.