SOLpro: accurate sequence-based prediction of protein solubility
University of California, Irvine
Indexed incrossrefdoajpubmed
Abstract
Abstract Motivation: Protein insolubility is a major obstacle for many experimental studies. A sequence-based prediction method able to accurately predict the propensity of a protein to be soluble on overexpression could be used, for instance, to prioritize targets in large-scale proteomics projects and to identify mutations likely to increase the solubility of insoluble proteins. Results: Here, we first curate a large, non-redundant and balanced training set of more than 17 000 proteins. Next, we extract and study 23 groups of features computed directly or predicted (e.g. secondary structure) from the primary sequence. The data and the features are used to train a two-stage support vector machine (SVM)…
Citation impact
672
total citations
- FWCI
- 2.33
- Percentile
- 100%
- References
- 49
Citations per year
Authors
3Topics & keywords
Topics
Keywords
- Computer science
- Support vector machine
- Sequence (biology)
- Data mining
- Proteomics
- Set (abstract data type)
- Solubility
- Machine learning
No related works found for this paper.