Multi-purpose RNA language modelling with motif-aware pretraining and type-guided fine-tuning
City University of Hong Kong · Baidu (China) · +3 more institutions
Abstract
Abstract Pretrained language models have shown promise in analysing nucleotide sequences, yet a versatile model excelling across diverse tasks with a single pretrained weight set remains elusive. Here we introduce RNAErnie, an RNA-focused pretrained model built upon the transformer architecture, employing two simple yet effective strategies. First, RNAErnie enhances pretraining by incorporating RNA motifs as biological priors and introducing motif-level random masking in addition to masked language modelling at base/subsequence levels. It also tokenizes RNA types (for example, miRNA, lnRNA) as stop words, appending them to sequences during pretraining. Second, subject to out-of-distribution tasks with RNA…
Citation impact
- FWCI
- 23.55
- Percentile
- 100%
- References
- 63
Authors
7Topics & keywords
- Motif (music)
- Computer science
- RNA
- Biology
- Physics
- Gene
- Genetics
- Acoustics
- Quality Education