Mining Bug Reports to Estimate Software Risks and Bug-fix Time

Author

Mahfoodh, Hussain

Linked Agent

Hammad, Mustafa, Thesis advisor

Language

English

Extent

[3], 12, 103, [1]

Place of institution

Sakhir, Bahrain

Thesis Type

Thesis (Master)

Institution

UNIVERSITY OF BAHRAIN College of Information Technology

Description

Abstract
Software bugs are considered inevitable in the software life cycle. Regardless of their
cause, software stakeholders could benefit from knowing the time to fix them to increase
software availability, increase security and provide better project management processes.
Furthermore, categorizing the level of software risk is very important for software developers
and project stakeholders where it will help software internal stakeholders to evaluate the
currently existing software risk and help on predicting upcoming quantitative software risk
values.
Bug duplication reporting is also one of the most widespread software problems that cause
inconvenience for the internal software stakeholders. It is useful for developers to eliminate
redundant bug records where the fewer bugs duplicated records in bug reports documentation
the more efficiently allocated resources are set to fix and enhance the software features.
Mining bug reports and the use of varieties of classifications methods from existing studies
has emphasized more on the software risk importance. By using historical data of bug reports,
it can be useful to categorize the software risk attributes to tackle the ongoing issues of the
software and to identify their causes.
In this thesis, we analyze public datasets related to software bug reports for a set of open
source projects. First, we use machine-learning algorithms to predict the time needed to fix
a given bug. The bug-fix time predictions results reached a maximum of 25% for the selected
dataset. The second goal is to propose a novel approach for software risk estimation from
a given bug report to classify the existing software risk values and to predict the upcoming
software risk values. The approach provided risk values results ranging from 27.4% to 84%
for one selected software component. Third, we evaluate the risk values extracted from bug
reports and compare them with risk values extracted from new approaches of word embedding
(Word2Vec) that automate and determine if the selected bug is from the duplicated category
using Natural language processing. The three proposed similarity measures to detect bug
duplication records through the use of the word embedding technique reached a maximum
precision accuracy of 99.89%. In addition, automated classification methods is used on the
risk results obtained from bug reports to categorize the urgency of the current software risk
levels and determine if they can be correctly identified by the used classification methods. The
approach used is able to detect 10% of the selected software component with 100% precision
rate. Finally, reinforcement learning approach will be used on the retrieved software risk
values from its different attributes to predict the upcoming software risk values and evaluate
the scenario used for the reinforcement learning behavior.

Note

Title on Cover:
تعدين تقارير الأخطاء لتقدير مخاطر البرمجيات ووقت إصلاح الأخطاء

Member of