MM
Multimodal Machine Learning Applications
This cluster of papers focuses on the development and improvement of visual question answering systems, image captioning techniques, and neural networks for understanding and generating descriptions of images and videos. The research involves semantic reasoning, multimodal fusion, scene graph generation, attention mechanisms, and deep learning approaches to bridge the gap between vision and language.
64,222
Publications
696,387
Citations
Loading papers...
Search by keywords
Filter by Type
- Article (74,361)
- Preprint (49,451)
- Book Chapter (8,692)
- Dissertation (1,226)
- Review (346)
Filter by Open Access Type
- Open Access (46,195)
- Closed Access (88,694)
Filter by Authors
- Yi Yang (270)
- Trevor Darrell (268)
- Heng Tao Shen (251)
- Mohit Bansal (246)
- Dacheng Tao (242)
Filter by Topics
- Multimodal Machine Learning Applications (134,889)
- Topic Modeling (38,889)
- Domain Adaptation and Few-Shot Learning (34,107)
- Advanced Image and Video Retrieval Techniques (26,330)
- Natural Language Processing Techniques (21,519)