Genome-wide prediction of disease variant effects with a deep protein language model

Brandes, Nadav; Goldman, Grant; Wang, Charlotte H.; Ye, Chun; Ntranos, Vasilis

doi:10.1038/s41588-023-01465-0

articleNature GeneticsAug 10, 2023HYBRID OA

Genome-wide prediction of disease variant effects with a deep protein language model

NBNadav Brandes GGGrant Goldman CHCharlotte H. Wang CYChun Ye VNVasilis Ntranos

University of California, San Francisco · Gladstone Institutes · +3 more institutions

PubMed

Indexed incrossrefpubmed

Abstract

Predicting the effects of coding variants is a major challenge. While recent deep-learning models have improved variant effect prediction accuracy, they cannot analyze all coding variants due to dependency on close homologs or software limitations. Here we developed a workflow using ESM1b, a 650-million-parameter protein language model, to predict all ~450 million possible missense variant effects in the human genome, and made all predictions available on a web portal. ESM1b outperformed existing methods in classifying ~150,000 ClinVar/HGMD missense variants as pathogenic or benign and predicting measurements across 28 deep mutational scan datasets. We further annotated ~2 million variants as damaging only in…

Citation impact

430

total citations

FWCI: 128.44
Percentile: 100%
References: 74

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Missense mutation
Computational biology
Indel
Workflow
Biology
Coding (social sciences)
Genome
Computer science

No related works found for this paper.