Document
A Novel Classifier Based on Genetic Algorithms and Data Importance Reformatting
Linked Agent
Hewahi, Nabil, Thesis advisor
Date Issued
2023
Language
English
Extent
[1]، 12، 65، [14] صفحة
Place of institution
Sakhir, Bahrain
Thesis Type
Thesis (Master)
English Abstract
Abstract:
Machine learning (ML) has attracted substantial attention and become progressively
more popular in recent years due to its ability to make predictions and decisions based
on data. The increasing use of ML in critical fields such as medical diagnosis and
fraud detection emphasizes the need to continuously enhance its performance and
eventually result in a better decision-making. This can be achieved through the use of
optimization methods (OM). However, sometimes the performance of ML algorithms
is limited by issues related to the nature of the data which can hinder its performance.
Therefore, a novel classification algorithm that is based on Data Importance (DI)
reformatting and Genetic Algorithms (GA) named GADIC is proposed in this
research to overcome these issues and improve the efficacy and robustness of ML
algorithms. The aim of this research is to evaluate the impact of the proposed
algorithm on the performance of the classifiers and compare it with other
conventional classification algorithms to measure its ability to improve the classifiers'
performance. The proposed algorithm comprises three phases which are data
reformatting phase which depends on DI concept, training phase where GA is applied
on the reformatted training set, and testing phase where the instances of the
reformatted testing set are being averaged based on similar instances in the training
set. The proposed algorithm has been tested on five existing ML classifiers which are
Support Vector Machine (SVM), K-Nearest Neighbour (KNN), Logistic Regression
(LR), Decision Tree (DT), and Naïve Bayes (NB). All were evaluated using seven
open-source UCI ML repository and Kaggle datasets which are Cleveland heart
disease, Indian liver patient, Pima Indian diabetes, employee future prediction,
telecom churn prediction, bank customer churn, and tech students. In terms of
accuracy, the results showed that, with the exception of approximately 1% decrease in
the accuracy of NB classifier in Cleveland heart disease dataset, GADIC significantly
enhanced the performance of most ML classifiers using various datasets. In addition,
KNN with GADIC showed the greatest performance gain when compared with other
ML classifiers with GADIC, with an average increase of 16.79%, followed by SVM
with an average increase of 9.03%. LR had the lowest improvement with an average
increase of 5.96%.
Member of
Identifier
https://digitalrepository.uob.edu.bh/id/6bdaf6b6-479b-40a6-a8d2-44d703f31d62