Evaluating Protein Transfer Learning with TAPE
Berkeley College · University of California, Berkeley · +2 more institutions
Abstract
Abstract Protein modeling is an increasingly popular area of machine learning research. Semi-supervised learning has emerged as an important paradigm in protein modeling due to the high cost of acquiring supervised protein labels, but the current literature is fragmented when it comes to datasets and standardized evaluation techniques. To facilitate progress in this field, we introduce the Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology. We curate tasks into specific training, validation, and test splits to ensure that each task tests biologically relevant generalization that transfers to real-life…
Citation impact
- FWCI
- —
- Percentile
- —
- References
- 63
Authors
8- RRRoshan RaoCorresponding
Berkeley College, University of California, Berkeley
- NBNicholas BhattacharyaCorresponding
Berkeley College, University of California, Berkeley
- NTNeil ThomasCorresponding
Berkeley College, University of California, Berkeley
- YDYan Duan
Berkeley College, University of California, Berkeley
- XCXi Chen
Berkeley College, University of California, Berkeley
Topics & keywords
- Computer science
- Artificial intelligence
- Machine learning
- Generalization
- Transfer of learning
- Set (abstract data type)
- Task (project management)
- Representation (politics)
- Industry, innovation and infrastructure