articleNature CommunicationsApr 3, 2025GOLD OA

CodonTransformer: a multispecies codon optimizer using context-aware neural networks

The Scarborough Hospital · Vector Institute · +7 more institutions

PubMed
Indexed incrossrefdoajpubmed

Abstract

Degeneracy in the genetic code allows many possible DNA sequences to encode the same protein. Optimizing codon usage within a sequence to meet organism-specific preferences faces combinatorial explosion. Nevertheless, natural sequences optimized through evolution provide a rich source of data for machine learning algorithms to explore the underlying rules. Here, we introduce CodonTransformer, a multispecies deep learning model trained on over 1 million DNA-protein pairs from 164 organisms spanning all domains of life. The model demonstrates context-awareness thanks to its Transformers architecture and to our sequence representation strategy that combines organism, amino acid, and codon encodings.…

No related works found for this paper.

Funding