articleNature GeneticsAug 10, 2023HYBRID OA

Genome-wide prediction of disease variant effects with a deep protein language model

University of California, San Francisco · Gladstone Institutes · +3 more institutions

PubMed
Indexed incrossrefpubmed

Abstract

Predicting the effects of coding variants is a major challenge. While recent deep-learning models have improved variant effect prediction accuracy, they cannot analyze all coding variants due to dependency on close homologs or software limitations. Here we developed a workflow using ESM1b, a 650-million-parameter protein language model, to predict all ~450 million possible missense variant effects in the human genome, and made all predictions available on a web portal. ESM1b outperformed existing methods in classifying ~150,000 ClinVar/HGMD missense variants as pathogenic or benign and predicting measurements across 28 deep mutational scan datasets. We further annotated ~2 million variants as damaging only in…

No related works found for this paper.

Funding