articleNature Machine IntelligenceMay 13, 2024HYBRID OA

Multi-purpose RNA language modelling with motif-aware pretraining and type-guided fine-tuning

City University of Hong Kong · Baidu (China) · +3 more institutions

Indexed incrossref

Abstract

Abstract Pretrained language models have shown promise in analysing nucleotide sequences, yet a versatile model excelling across diverse tasks with a single pretrained weight set remains elusive. Here we introduce RNAErnie, an RNA-focused pretrained model built upon the transformer architecture, employing two simple yet effective strategies. First, RNAErnie enhances pretraining by incorporating RNA motifs as biological priors and introducing motif-level random masking in addition to masked language modelling at base/subsequence levels. It also tokenizes RNA types (for example, miRNA, lnRNA) as stop words, appending them to sequences during pretraining. Second, subject to out-of-distribution tasks with RNA…

Citation impact

112
total citations
FWCI
23.55
Percentile
100%
References
63
Citations per year

Authors

7

Topics & keywords

Keywords
  • Motif (music)
  • Computer science
  • RNA
  • Biology
  • Physics
  • Gene
  • Genetics
  • Acoustics
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.

Funding