The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families
J. Craig Venter Institute · University of California, Davis · +10 more institutions
Abstract
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting…
Citation impact
- FWCI
- 41.17
- Percentile
- 100%
- References
- 155
Authors
33Topics & keywords
- Biology
- Metagenomics
- Computational biology
- Cluster analysis
- Protein sequencing
- Homology (biology)
- Genomics
- Protein domain
- Life below water