Unraveling the functional dark matter through global metagenomics
Lawrence Berkeley National Laboratory · Joint Genome Institute · +86 more institutions
Abstract
Abstract Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities 1,2 . Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or…
Citation impact
- FWCI
- 51.59
- Percentile
- 100%
- References
- 68
Authors
132- GAGeorgios A. PavlopoulosCorresponding
Lawrence Berkeley National Laboratory, Joint Genome Institute, National and Kapodistrian University of Athens, Frontier Science Foundation-Hellas, Alexander Fleming Biomedical Sciences Research Center
- FAFotis A. Baltoumas
Alexander Fleming Biomedical Sciences Research Center
- SLSirui Liu
Harvard University Press
- OSOğuz Selvitopi
Lawrence Berkeley National Laboratory
- APAntônio Pedro Camargo
Lawrence Berkeley National Laboratory, Joint Genome Institute
Topics & keywords
- Metagenomics
- Dark matter
- Computational biology
- Evolutionary biology
- Biology
- Astronomy
- Physics
- Genetics
- Life in Land
Funding
- NSNational Science FoundationAwards: DEB-1927155, 1459200, 1641019, IOS-0965336, OCE-173723, DEB-1146149, DE-SC0018409, 1442231, 80NSSC19K1633, DEB 1912525, 0965336, 1643486, 1912525, 1146149, 0424602, 1559179, OCE 0424602, OCE-1537951, EAR-1820658, 1438092, 1537951, 1820658, 1927155
- UDU.S. Department of EnergyAwards: FC02-07ER64494, 07ER64494, DE-FC02-07ER64494, DE-SC0014395, DE-SC0020382, SC0018409, FG02-94ER20137, DE-SC0018409, DE-FG02-, DE-FG02-94ER20137, DE-FG02
- NANational Aeronautics and Space AdministrationAwards: NNX17AK85G, 80NSSC19K1633, NNX16AJ62G
- UDU.S. Department of AgricultureAwards: 2009-447 35319-05186, DE-FC02-07ER64494, 2017-67019-26396, 2011-67019-30178, 67019, IOS-0965336
- GAGordon and Betty Moore Foundation
- GCGenome Canada
- NENational Energy Research Scientific Computing Center
- TSTowards Sustainability Foundation
- GBGenome British Columbia
- NSNuclear Safety and Security Commission
- MMax-Planck-Gesellschaft
- NINational Institutes of HealthAwards: 05186, P20 GM103475, GM103475
- NINational Institute of Food and AgricultureAwards: 2017-67019-26396, 2009-447 35319-05186, DE-FC02-07ER64494, 2011-67019-30178, DE-SC0018409
- OOOffice of ScienceAwards: DE-FC02-07ER64494, DE-SC0014395, FC02-07ER64494, DE-SC0020382, DE-SC0018409
- DRDavid R. Atkinson Center for a Sustainable Future , Cornell University
- GLGreat Lakes Bioenergy Research CenterAwards: DE-SC0018409, DE-FC02-07ER64494
- NSNatural Sciences and Engineering Research Council of Canada
- NINational Institute of General Medical SciencesAward: P20 GM103475
- BEBasic Energy SciencesAwards: DE-SC0018409, DE-FG02, DE-FC02-07ER64494, DE-FG02-94ER20137
- BABiological and Environmental ResearchAwards: DE-SC0020382, DE-SC0014395, DE-FC02-07ER64494, SC0020382, DE-SC0018409, SC0018409
- CSChemical Sciences, Geosciences, and Biosciences Division
- OROak Ridge National Laboratory
- PNPacific Northwest National Laboratory