SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
University of Applied Sciences and Arts of Southern Switzerland · Shandong University of Political Science and Law · +4 more institutions
Abstract
In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced a little over one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently surpassed the level of non-expert humans, suggesting limited headroom for further research. In this paper we present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard. SuperGLUE is available at super.gluebenchmark.com.
Citation impact
- FWCI
- —
- Percentile
- —
- References
- 67
Authors
8- AWAlex WangCorresponding
University of Applied Sciences and Arts of Southern Switzerland, Shandong University of Political Science and Law, Supélec
- YPYada Pruksachatkun
Supélec, Shandong University of Political Science and Law, University of Applied Sciences and Arts of Southern Switzerland
- NNNikita Nangia
Supélec, Shandong University of Political Science and Law, University of Applied Sciences and Arts of Southern Switzerland
- ASAmanpreet Singh
Meta (Israel)
- JMJulian Michael
University of Washington
Topics & keywords
- Benchmark (surveying)
- Computer science
- Set (abstract data type)
- Metric (unit)
- Software
- Machine learning
- Language model
- Artificial intelligence
- Quality Education