الملخص الإنجليزي
Abstract :
Computer Vision has seen rapid growth in development over the last decade and in specific with Object Detection (OD) with prime real-life examples and implementations such as Facial Recognition and Self- Driving Vehicles. OD techniques are usually classified into two categories, Two-stage, such as Faster Region-based Convolutional Neural Networks (Faster R-CNN) and One-stage detectors, such as You Only Look Once (YOLO), with the former considered to have higher accuracy while the latter are known to perform faster. This research aimed to examine and test the latest developed and pre-trained models of Faster R-CNN and YOLOv7, against the largest OD dataset to date, the Open Images Dataset v6 (OIDv6), containing approximately 14 million bounding boxes. Experiments were executed by testing various models; R50-FPN, R101- FPN, YOLOv7-Tiny, YOLOv7, and the YOLOv7-E6E, on the OIDv6 validation dataset, after ensuring the OIDv6 class labels were mapped to MS COCO class labels. The results of the experiments proved the YOLOv7-E6E to be the most effective and efficient OD model on the OIDv6 with an Average Precision (AP) of 44.3% achieved on the dataset containing common and mapped labels. The experiments provide a potential to prove that using the OIDv6 can potentially increase OD model performance, and it is highly possible that using the dataset can improve model outcomes if used as the baseline for training due to its exponentially larger size and diversity in classes. This research has laid the foundational work for future developments in preparing a pre-trained model based on the OIDv6 for future research and comparative analyses.