English Abstract
Abstract :
The Road Traffic Accidents is one of the main causes that contribute to increase the death rate worldwide. So, many countries including the Kingdom of Bahrain are cooperating to reduce accidents as much as possible. In recent years, the machine learning approach has been used noticeably in many fields such as medicine, economics, education, biology, and transportation. In this study, many classifiers, namely KNN, SVM, DT, RF, and NB have been investigated for their high performance in previous studies to predict the severity of RTAs based on several factors such as accident date, weather, and road type by choosing appropriate parameters for each model.
Five real datasets were collected by the statistic section at the general directorate of traffic in the Kingdom of Bahrain. Each dataset refers to a specific year from 2018 to 2022. In addition, After the preparation process with data cleaning, data structuring, and feature engineering, the datasets under investigation combined into 5191 injuries caused by RTAs.
The comparison between mentioned classifiers is done by using several evaluation methods which are accuracy, F-measure, recall, precision, and AUC. Interestingly, the classes of the target attribute were imbalanced. For that, the researcher used the imbalanced dataset with over-sampling and under-sampling methods to evaluate the performance of models in these cases.
The results of this study were satisfactory. The best classifier with all evaluation methods was achieved with the RF algorithm of 85% for all criteria by using the over- sampling method. Meanwhile, the worst classifier is the NB when the dataset used was imbalanced with a precision of 63%, recall of 10%, F-measure of 10%, and accuracy of 10%. Also, through building all models the class fatal has the highest AUC compared to other classes. It is observed that the over-sampling method outperformed other methods with most models. In addition, the most important features that affect the RTAs are cause type and accident day of week, the least important features with significant differences are road condition and road surface. The results of the research are promising and are expected to be applied by the Bahraini General Directorate of Traffic.