Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics
University of California, Berkeley · Lawrence Berkeley National Laboratory
Abstract
We introduce a new representation and feature extraction method for biological sequences. Named bio-vectors (BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins (amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representation can be widely used in applications of deep learning in proteomics and genomics. In the present paper, we focus on protein-vectors that can be utilized in a wide array of bioinformatics investigations such as family classification, protein visualization, structure prediction, disordered protein identification, and protein-protein interaction prediction. In this method, we adopt artificial neural network approaches and…
Citation impact
- FWCI
- 17.92
- Percentile
- 100%
- References
- 49
Authors
2Topics & keywords
- Structural genomics
- Computational biology
- Protein sequencing
- Proteomics
- Structural Classification of Proteins database
- Genomics
- Protein methods
- Protein family