articlePLoS BiologyMar 8, 2007GOLD OA

The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families

J. Craig Venter Institute · University of California, Davis · +10 more institutions

PubMed
Indexed incrossrefdoajpubmed

Abstract

Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting…

Citation impact

928
total citations
FWCI
41.17
Percentile
100%
References
155
Citations per year

Authors

33

Topics & keywords

Keywords
  • Biology
  • Metagenomics
  • Computational biology
  • Cluster analysis
  • Protein sequencing
  • Homology (biology)
  • Genomics
  • Protein domain
UN Sustainable Development Goals
  • Life below water
No related works found for this paper.

Funding