articleJun 16, 2024Closed access

MMMU: A Massive Multi-Discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

University of Waterloo · The Ohio State University · +3 more institutions

Indexed incrossref

Abstract

We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes 11.5K meticulously collected multimodal questions from college exams, quizzes, and text-books, covering six core disciplines: Art & Design, Busi-ness, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering. These questions span 30 subjects and 183 subfields, comprising 30 highly het-erogeneous image types, such as charts, diagrams, maps, tables, music sheets, and chemical structures. Unlike existing benchmarks, MMMU focuses on advanced perception and reasoning with domain-specific knowledge,…

Citation impact

227
total citations
FWCI
50.89
Percentile
100%
References
114
Citations per year

Authors

22

Topics & keywords

Keywords
  • Benchmark (surveying)
  • Computer science
  • Artificial intelligence
  • Cognitive science
  • Psychology
  • Geology
No related works found for this paper.