CodonTransformer: a multispecies codon optimizer using context-aware neural networks
The Scarborough Hospital · Vector Institute · +7 more institutions
Abstract
Degeneracy in the genetic code allows many possible DNA sequences to encode the same protein. Optimizing codon usage within a sequence to meet organism-specific preferences faces combinatorial explosion. Nevertheless, natural sequences optimized through evolution provide a rich source of data for machine learning algorithms to explore the underlying rules. Here, we introduce CodonTransformer, a multispecies deep learning model trained on over 1 million DNA-protein pairs from 164 organisms spanning all domains of life. The model demonstrates context-awareness thanks to its Transformers architecture and to our sequence representation strategy that combines organism, amino acid, and codon encodings.…
Citation impact
- FWCI
- 30.02
- Percentile
- 100%
- References
- 64
Authors
5- AFAdibvafa FallahpourCorresponding
The Scarborough Hospital, Vector Institute
- VGVincent Gureghian
Centre National de la Recherche Scientifique, Inserm, Sorbonne Université, Biologie Computationnelle, Quantitative et Synthétique, Institut de Biologie Paris-Seine
- GJGuillaume J. Filion
The Scarborough Hospital
- ABAriel B. Lindner
Université de Technologie de Compiègne, Centre National de la Recherche Scientifique, Inserm, Sorbonne Université, Biologie Computationnelle, Quantitative et Synthétique, Institut de Biologie Paris-Seine
- APAmir Pandi
Université de Technologie de Compiègne, Centre National de la Recherche Scientifique, Inserm, Sorbonne Université, Biologie Computationnelle, Quantitative et Synthétique, Institut de Biologie Paris-Seine, Université Paris 1 Panthéon-Sorbonne
Topics & keywords
- Computer science
- ENCODE
- Genetic code
- Codon usage bias
- Security token
- Encoding (memory)
- Context (archaeology)
- Theoretical computer science
Funding
- INInstitut National de la Santé et de la Recherche Médicale
- UOUniversity of Toronto
- CNCentre National de la Recherche Scientifique
- FBFondation Bettencourt Schueller
- CICanadian Institutes of Health Research
- NSNatural Sciences and Engineering Research Council of CanadaAwards: RGPIN-2020, RGPIN-2020-06377
- UOUniversity of Toronto Scarborough