MMMU: A Massive Multi-Discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Xiang, Yue; Ni, Yuansheng; Zheng, Tianyu; Zhang, Kai; Liu, Ruoqi; ZHANG, Ge; Stevens, Samuel; Jiang, Dongfu; Ren, Weiming; Sun, Yuxuan; Wei, Cong; Yu, Botao; Yuan, Ruibin; Sun, Renliang; Yin, Minghao; Zheng, Boyuan; Yang, Zhenzhu; Liu, Yibo; Huang, Wenhao; Sun, Huan; Su, Yu; Chen, Wenhu

doi:10.1109/cvpr52733.2024.00913

articleJun 16, 2024Closed access

MMMU: A Massive Multi-Discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

YXYue Xiang YNYuansheng Ni TZTianyu Zheng KZKai Zhang RLRuoqi Liu

University of Waterloo · The Ohio State University · +3 more institutions

Indexed incrossref

Abstract

We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes 11.5K meticulously collected multimodal questions from college exams, quizzes, and text-books, covering six core disciplines: Art & Design, Busi-ness, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering. These questions span 30 subjects and 183 subfields, comprising 30 highly het-erogeneous image types, such as charts, diagrams, maps, tables, music sheets, and chemical structures. Unlike existing benchmarks, MMMU focuses on advanced perception and reasoning with domain-specific knowledge,…

Citation impact

227

total citations

FWCI: 50.89
Percentile: 100%
References: 114

Citations per year

Authors

22

Topics & keywords

Topics

Keywords

Benchmark (surveying)
Computer science
Artificial intelligence
Cognitive science
Psychology
Geology

No related works found for this paper.