CADD v1.7: using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions

Schubach, Max; Maaß, Thorben; Nazaretyan, Lusiné; Röner, Sebastian; Kircher, Martin

doi:10.1093/nar/gkad989

articleNucleic Acids ResearchJan 5, 2024GOLD OA

CADD v1.7: using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions

MSMax Schubach TMThorben Maaß LNLusiné Nazaretyan SRSebastian Röner MKMartin Kircher

Berlin Institute of Health at Charité - Universitätsmedizin Berlin · University Hospital Schleswig-Holstein · +1 more institution

PubMed

Indexed incrossrefdoajpubmed

Abstract

Machine Learning-based scoring and classification of genetic variants aids the assessment of clinical findings and is employed to prioritize variants in diverse genetic studies and analyses. Combined Annotation-Dependent Depletion (CADD) is one of the first methods for the genome-wide prioritization of variants across different molecular functions and has been continuously developed and improved since its original publication. Here, we present our most recent release, CADD v1.7. We explored and integrated new annotation features, among them state-of-the-art protein language model scores (Meta ESM-1v), regulatory variant effect predictions (from sequence-based convolutional neural networks) and sequence…

Citation impact

385

total citations

FWCI: 190.47
Percentile: 100%
References: 98

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Biology
Genome
Genetics
Computational biology
Nucleotide
Gene

No related works found for this paper.