Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective

Olisah, Chollette C.; Smith, Lyndon; Smith, Melvyn

doi:10.1016/j.cmpb.2022.106773

articleComputer Methods and Programs in BiomedicineMar 31, 2022HYBRID OA

Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective

CCChollette C. Olisah LSLyndon Smith MSMelvyn Smith

University of the West of England · Bristol Robotics Laboratory

PubMed

Indexed incrossrefpubmed

Abstract

Methods

In this paper, a robust framework for building a diabetes prediction model to aid in the clinical diagnosis of diabetes is proposed. The framework includes the adoption of Spearman correlation and polynomial regression for feature selection and missing value imputation, respectively, from a perspective that strengthens their performances. Further, different supervised machine learning models, the random forest (RF) model, support vector machine (SVM) model, and our designed twice-growth deep neural network (2GDNN) model are proposed for classification. The models are optimized by tuning the hyperparameters of the models using grid search and repeated stratified k-fold cross-validation and evaluated for their ability to scale to the prediction problem.

Results

Through experiments on the PIMA Indian and LMCH diabetes datasets, precision, sensitivity, F1-score, train-accuracy, and test-accuracy scores of 97.34%, 97.24%, 97.26%, 99.01%, 97.25 and 97.28%, 97.33%, 97.27%, 99.57%, 97.33, are achieved with the proposed 2GDNN model, respectively.

Citation impact

238

total citations

FWCI: 56.69
Percentile: 100%
References: 49

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Artificial intelligence
Machine learning
Random forest
Diabetes mellitus
Computer science
Missing data
Feature selection
Support vector machine

No related works found for this paper.