Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective
University of the West of England · Bristol Robotics Laboratory
Abstract
In this paper, a robust framework for building a diabetes prediction model to aid in the clinical diagnosis of diabetes is proposed. The framework includes the adoption of Spearman correlation and polynomial regression for feature selection and missing value imputation, respectively, from a perspective that strengthens their performances. Further, different supervised machine learning models, the random forest (RF) model, support vector machine (SVM) model, and our designed twice-growth deep neural network (2GDNN) model are proposed for classification. The models are optimized by tuning the hyperparameters of the models using grid search and repeated stratified k-fold cross-validation and evaluated for their ability to scale to the prediction problem.
Through experiments on the PIMA Indian and LMCH diabetes datasets, precision, sensitivity, F1-score, train-accuracy, and test-accuracy scores of 97.34%, 97.24%, 97.26%, 99.01%, 97.25 and 97.28%, 97.33%, 97.27%, 99.57%, 97.33, are achieved with the proposed 2GDNN model, respectively.
Citation impact
- FWCI
- 56.69
- Percentile
- 100%
- References
- 49
Authors
3Topics & keywords
- Artificial intelligence
- Machine learning
- Random forest
- Diabetes mellitus
- Computer science
- Missing data
- Feature selection
- Support vector machine