Cancer Prediction based on DNA Next Generation Genomic Sequencing using Machine Learning Algorithms
Linked Agent
Ksantini, Riadh Bin Mohammad , Thesis advisor
Albalooshi, Fatema Abdulqader Yousif , Thesis advisor
Date Issued
2021
Language
English
Extent
[1],12,118,[2] pages
Place of institution
Sakhir, Bahrain
Thesis Type
Thesis (Master)
Institution
UNIVERSITY OF BAHRAIN, College of Information Technology
Description
Abstract:
Cancer is a dangerous disease that threatens human life and leads to severe side effects
and consequences, and in many cases, it leads to death. Cancer occurs when the cells
start to divide and grow exponentially and uncontrollably. The problem is that cancer
is often not discovered early in its first stage until after a long period has passed.
Hence, the treatment, in this case, is more complicated and sometimes is not possible.
Different diseases, including cancer, progress because of the mutations in the
Deoxyribonucleic Acid (DNA) genomic sequence. Genes’ mutations NextGeneration Sequencing (NGS) technology advancement enabled scientists
and bioinformaticians to study genomic sequencing, which helps identify the
effects of gene mutations on developing different types of diseases. This thesis
focuses on predicting and classifying different cancers based on NGS technology
by utilizing Ribonucleic Acid Sequence (RNA-Seq) gene expression data using
various machine learning algorithms. This research aims to find the most potential
biomarkers for various cancers that significantly help in personalized medicine.
Four different approaches and models have been proposed in this study, namely
Mutual Information-Support Vector Machine, Recursive Feature EliminationSupport Vector Machine, Random Forest. The fourth model is a novel ensemble
approach that discovers the most relevant genes causing a specific type of
cancer. Moreover, four datasets (COAD, LUAD, BRCA, and Combined dataset
from the first three datasets) were applied and tested with the developed
models, and the results were justified and validated. A comparison study of
the obtained results with the existing research has been done. The
experimental results within accuracy scores of up to 99.4% show that the
proposed models compared to the state-of-the-art methods, efficiently predict
and classify cancer and discover the most potential genes to develop a specific
type of cancer.
Cancer is a dangerous disease that threatens human life and leads to severe side effects
and consequences, and in many cases, it leads to death. Cancer occurs when the cells
start to divide and grow exponentially and uncontrollably. The problem is that cancer
is often not discovered early in its first stage until after a long period has passed.
Hence, the treatment, in this case, is more complicated and sometimes is not possible.
Different diseases, including cancer, progress because of the mutations in the
Deoxyribonucleic Acid (DNA) genomic sequence. Genes’ mutations NextGeneration Sequencing (NGS) technology advancement enabled scientists
and bioinformaticians to study genomic sequencing, which helps identify the
effects of gene mutations on developing different types of diseases. This thesis
focuses on predicting and classifying different cancers based on NGS technology
by utilizing Ribonucleic Acid Sequence (RNA-Seq) gene expression data using
various machine learning algorithms. This research aims to find the most potential
biomarkers for various cancers that significantly help in personalized medicine.
Four different approaches and models have been proposed in this study, namely
Mutual Information-Support Vector Machine, Recursive Feature EliminationSupport Vector Machine, Random Forest. The fourth model is a novel ensemble
approach that discovers the most relevant genes causing a specific type of
cancer. Moreover, four datasets (COAD, LUAD, BRCA, and Combined dataset
from the first three datasets) were applied and tested with the developed
models, and the results were justified and validated. A comparison study of
the obtained results with the existing research has been done. The
experimental results within accuracy scores of up to 99.4% show that the
proposed models compared to the state-of-the-art methods, efficiently predict
and classify cancer and discover the most potential genes to develop a specific
type of cancer.
Member of
Identifier
https://digitalrepository.uob.edu.bh/id/680bd2f1-668f-4295-ab20-bb87df32006e
https://digitalrepository.uob.edu.bh/id/680bd2f1-668f-4295-ab20-bb87df32006e