English Abstract
Abstract :
The banking industry has been witnessing fierce competition recently which made it necessary for banks and financial institutions to shift from conventional methods to digital and online that are backed up and guided by data mining and machine learning techniques. As for marketing for various banking products, the banking industry has shifted from personal individual visits to telemarketing to lower costs and achieve better results. To make their marketing campaigns less costly and more effective, banks have sought ways to avoid contacting customers who are less likely to opt for the marketed banking product or service. This thesis aims to provide banks and financial institutions with a data mining solution to help them in identifying the potential customers and filter out customers who are unlikely to buy the product or service. The proposed model predicts the customer's response to the telemarketing campaign using an ensemble classifier that is based on hybrid machine learning models. The ensemble classifier uses personal information, financial status of the clients, and the history of the previous telemarketing campaign conducted through telemarketing to the customer. A thorough Exploratory Data Analysis (EDA) is also performed to identify dataset problems and give useful insights into key customer attributes that can affect their subscription/non- subscription to a product. Then pre-processing of data is conducted using standardization, encoding categorical attributes, selecting features based on correlation and importance, and removal of model biasness by catering to the problem of class imbalance in the dataset. Afterwards, several classical machine learning models including Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Support Victor Machine (SVM), Naïve Bayes (NB), and K-Nearest Neighbour (KNN) are trained on the dataset to be used as a base line for the proposed ensemble model. Then the Ensemble Voting Classifier, based on the Extremely Randomized Tree hybrid model, was used to predict the customer's response. The accuracy rate of the proposed Ensemble Voting Classifier reached 95.86%, slightly more accurate than the Extra Tree Classifier (95.58%), the RF (94.02%), and DT (92.80%). Compared to the results of the other tested hybrid and traditional models, the proposed Ensemble Voting Classifier is considered the most accurate model in identifying the customers with the highest subscription likelihood.