Document

A Comparative Study on Machine Learning Models to Classify Diseases Based on Patient Behaviour and Habits

Linked Agent
Hewahi, Nabil, Thesis advisor
Alasaadi, Abdulla , Thesis advisor
Date Issued
2023
Language
English
Extent
[1], 11, 84, [15]
Place of institution
Skhair, Bahrain
Thesis Type
Thesis (Master)
Institution
University of Bahrain, College of Science, Department of Postgraduate Programs
English Abstract
Abstract : In recent years, Machine Learning (ML) algorithms have been used in a variety of medical research fields. These algorithms have been used to assist in the prediction of diseases based on large amounts of health data. This wealth of valuable information in these data can identify and allow people to act proactively, helping physicians to accurately diagnose. In this thesis, six supervised ML algorithms were used to explore two issues: the relationship between Patient-Related Factors (PRF)and Diabetes, Stroke, Heart Disease (HD), and Kidney Disease (KD). The second issue explores the impact of various diseases combined with Patient-Related Factors on Heart Disease. Logistic Regression (LR), Naive Bayes (NB), Random Forest (RF), K-Nearest Neighbour (KNN), Extreme Gradient Boost (XGB), and Support Vector Machine (SVM) were the methods utilized. A dataset was obtained from the Kaggle repository. Dataset attributes can be divided into Patient-Related Factors and diseases. This study aimed to compare and evaluate ML models once for classifying diseases based on PRF and once for classifying Heart Disease based on PRF combined with other diseases. In the results, all ML models predicted Heart Disease more accurately than Diabetes, Strokes, and Kidney Disease. These models even performed better when predefined diseases were combined with PRF to predict Heart Disease. It has been drawn two conclusions: these features are closely associated with Heart Disease. In addition, HD is more likely to occur if these diseases are present, particularly Diabetes and Stroke. In terms of performance, there were no significant differences between the models. The accuracy of the models ranged between 70 and 76%. In spite of this, the Logistic Regression outperformed others with 75% accuracy when using only PRF and 76% accuracy when using both disease attributes and PRF.
Note
العنوان على الغلاف :
دراسة مقارنة حول نماذج التعلم الآلي لتصنيف الأمراض بناءً على العوامل المتعلقة بالمريض
Member of
Identifier
https://digitalrepository.uob.edu.bh/id/0001da97-342c-4f42-ad06-561b12dde69f