MMMU: A Massive Multi-Discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
University of Waterloo · The Ohio State University · +3 more institutions
Abstract
We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes 11.5K meticulously collected multimodal questions from college exams, quizzes, and text-books, covering six core disciplines: Art & Design, Busi-ness, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering. These questions span 30 subjects and 183 subfields, comprising 30 highly het-erogeneous image types, such as charts, diagrams, maps, tables, music sheets, and chemical structures. Unlike existing benchmarks, MMMU focuses on advanced perception and reasoning with domain-specific knowledge,…
Citation impact
- FWCI
- 50.89
- Percentile
- 100%
- References
- 114
Authors
22Topics & keywords
- Benchmark (surveying)
- Computer science
- Artificial intelligence
- Cognitive science
- Psychology
- Geology