English Abstract
Abstract :
The refinement of petroleum products of crude oil constitutes a pivotal aspect of modern society, as these products are extensively utilized across numerous industries and sectors. Efficient profiling of petroleum products is of paramount importance, as it plays a crucial role in quality control, environmental protection, and adherence to the ever-evolving market demands.
This study has the potential to substantially influence the oil-based industry through its nuanced profiling of five petroleum products, including Diesel, Kerosene, Gasoline, Fuel, and Lube Base Oil.
For profiling petroleum products through Fourier Transform Infrared (FTIR) data, this study implements a cutting-edge machine learning (ML) methodology, specifically an ensemble of one-class classifiers (OCCs). The OCCs can overcome the imbalanced issue, class overlap, in-sample noise, and outlier identification. Six OCCs were compared, and each consisted of a different base classifier, which are One-class Support Vector Machines (OCSVM), Gaussian Mixture Models (GMM), Isolation Forest (IF), Principal Component Analysis (PCA), Local Outlier Factor (LOF), and K-Nearest Neighbor (KNN). The OCCs were validated against Artificial Neural Networks (ANN).
The accuracy results indicate that the GMM-based ensemble achieved the highest performance across all evaluation metrics, followed by OCSVM-based and LOF-based ensembles with 0.936, IF-based ensemble with 0.908, KNN-based ensemble with 0.899, and PCA-based ensemble with 0.871. Each of the ensembles outperforms the ANN classifier, which struggled with imbalanced data, achieving an accuracy of 0.441. Some of the used methods are sensitive to high dimensional data, therefore the experiments were also done with dimension reduced FTIR data using an optimized Autoencoder. The results showed a huge drop in the performance of OCSVM-based, a slight drop for GMM-based, PCA-based, and IF-based OCCs, and remained the same for LOF-based and KNN- based.
This study empowers ML for important implications for the oil industry, as it provides versatile, robust, and effective maneuvers for employing it for profiling petroleum products. The results of this study provide valuable insight into the strengths and weaknesses of the different OCCs that may assist practitioners in selecting the most appropriate classifier for their specific problem.