RiNALMo: general-purpose RNA language models can generalize well on structure prediction tasks
University of Zagreb · Agency for Science, Technology and Research · +2 more institutions
Abstract
While RNA has recently been recognized as an interesting small-molecule drug target, many challenges remain to be addressed before we take full advantage of it. This emphasizes the necessity to improve our understanding of its structures and functions. Over the years, sequencing technologies have produced an enormous amount of unlabeled RNA data, which hides a huge potential. Motivated by the successes of protein language models, we introduce RiboNucleic Acid Language Model (RiNALMo) to unveil the hidden code of RNA. RiNALMo is the largest RNA language model to date, with 650M parameters pre-trained on 36M non-coding RNA sequences from several databases. It can extract hidden knowledge and capture the…
Citation impact
- FWCI
- 40.89
- Percentile
- 100%
- References
- 82
Authors
5- RJRafael Josip PenićCorresponding
University of Zagreb
- TVTin Vlašić
Agency for Science, Technology and Research, Genome Institute of Singapore
- RGRoland G. Huber
Agency for Science, Technology and Research, Bioinformatics Institute
- YWYue Wan
Agency for Science, Technology and Research, Genome Institute of Singapore
- MŠMile Šikić
Agency for Science, Technology and Research, University of Zagreb
Topics & keywords
- Computer science
- RNA
- Computational biology
- Biology
- Genetics
- Gene