articlePLoS ONENov 10, 2015GOLD OA

Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics

University of California, Berkeley · Lawrence Berkeley National Laboratory

PubMed
Indexed inarxivcrossrefdoajpubmed

Abstract

We introduce a new representation and feature extraction method for biological sequences. Named bio-vectors (BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins (amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representation can be widely used in applications of deep learning in proteomics and genomics. In the present paper, we focus on protein-vectors that can be utilized in a wide array of bioinformatics investigations such as family classification, protein visualization, structure prediction, disordered protein identification, and protein-protein interaction prediction. In this method, we adopt artificial neural network approaches and…

Citation impact

908
total citations
FWCI
17.92
Percentile
100%
References
49
Citations per year

Authors

2

Topics & keywords

Keywords
  • Structural genomics
  • Computational biology
  • Protein sequencing
  • Proteomics
  • Structural Classification of Proteins database
  • Genomics
  • Protein methods
  • Protein family
No related works found for this paper.

Funding