SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Supélec · University of Applied Sciences and Arts of Southern Switzerland · +5 more institutions
Abstract
In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced a little over one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently surpassed the level of non-expert humans, suggesting limited headroom for further research. In this paper we present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard. SuperGLUE is available at https://super.gluebenchmark.com.
Citation impact
- FWCI
- 67.95
- Percentile
- 100%
- References
- 0
Authors
8- AWAlex WangCorresponding
Supélec, University of Applied Sciences and Arts of Southern Switzerland, Shandong University of Political Science and Law
- YPYada Pruksachatkun
Supélec, University of Applied Sciences and Arts of Southern Switzerland, Shandong University of Political Science and Law
- NNNikita Nangia
Supélec, University of Applied Sciences and Arts of Southern Switzerland, Shandong University of Political Science and Law
- ASAmanpreet Singh
Meta (Israel)
- JMJulian Michael
University of Washington
Topics & keywords
- Benchmark (surveying)
- Computer science
- Set (abstract data type)
- Metric (unit)
- Software
- Artificial intelligence
- Language model
- Machine learning
- Quality Education