articleBioinformaticsJun 23, 2009BRONZE OA

SOLpro: accurate sequence-based prediction of protein solubility

University of California, Irvine

PubMed
Indexed incrossrefdoajpubmed

Abstract

Abstract Motivation: Protein insolubility is a major obstacle for many experimental studies. A sequence-based prediction method able to accurately predict the propensity of a protein to be soluble on overexpression could be used, for instance, to prioritize targets in large-scale proteomics projects and to identify mutations likely to increase the solubility of insoluble proteins. Results: Here, we first curate a large, non-redundant and balanced training set of more than 17 000 proteins. Next, we extract and study 23 groups of features computed directly or predicted (e.g. secondary structure) from the primary sequence. The data and the features are used to train a two-stage support vector machine (SVM)…

Citation impact

672
total citations
FWCI
2.33
Percentile
100%
References
49
Citations per year

Authors

3

Topics & keywords

Keywords
  • Computer science
  • Support vector machine
  • Sequence (biology)
  • Data mining
  • Proteomics
  • Set (abstract data type)
  • Solubility
  • Machine learning
No related works found for this paper.