English Abstract
Abstract :
The Road Traffic Accidents is one of the main causes that contribute to increase the
death rate worldwide. So, many countries including the Kingdom of Bahrain are
cooperating to reduce accidents as much as possible. In recent years, the machine learning
approach has been used noticeably in many fields such as medicine, economics, education,
biology, and transportation. In this study, many classifiers, namely KNN, SVM, DT, RF,
and NB have been investigated for their high performance in previous studies to predict
the severity of RTAs based on several factors such as accident date, weather, and road
type by choosing appropriate parameters for each model.
Five real datasets were collected by the statistic section at the general directorate of
traffic in the Kingdom of Bahrain. Each dataset refers to a specific year from 2018 to
2022. In addition, After the preparation process with data cleaning, data structuring, and
feature engineering, the datasets under investigation combined into 5191 injuries caused
by RTAs.
The comparison between mentioned classifiers is done by using several evaluation
methods which are accuracy, F-measure, recall, precision, and AUC. Interestingly, the
classes of the target attribute were imbalanced. For that, the researcher used the
imbalanced dataset with over-sampling and under-sampling methods to evaluate the
performance of models in these cases.
The results of this study were satisfactory. The best classifier with all evaluation
methods was achieved with the RF algorithm of 85% for all criteria by using the over-
sampling method. Meanwhile, the worst classifier is the NB when the dataset used was
imbalanced with a precision of 63%, recall of 10%, F-measure of 10%, and accuracy of
10%. Also, through building all models the class fatal has the highest AUC compared to
other classes. It is observed that the over-sampling method outperformed other methods
with most models. In addition, the most important features that affect the RTAs are cause
type and accident day of week, the least important features with significant differences
are road condition and road surface. The results of the research are promising and are
expected to be applied by the Bahraini General Directorate of Traffic.