An Empirical Study on Just-in-time Conformal Defect Prediction
Code changes can introduce defects that affect software quality and reliability. Just-in-time (JIT) defect prediction techniques provide feedback at check-in time on whether a code change is likely to contain defects. This immediate feedback allows practitioners to make timely decisions regarding potential defects.
However, a prediction model may deliver false predictions, that may negatively affect practitioners’ decisions. False positive predictions lead to unnecessarily spending resources on investigating clean code changes, while false negative predictions may result in overlooking defective changes. Knowing how uncertain a defect prediction is, would help practitioners to avoid wrong decisions.
Previous research in defect prediction explored different approaches to quantify prediction uncertainty for supporting decision-making activities. However, these approaches only offer a heuristic quantification of uncertainty and do not provide guarantees.
In this study, we use conformal prediction (CP) as a rigorous uncertainty quantification approach on top of JIT defect predictors. We assess how often CP can provide guarantees for JIT defect predictions. We also assess how many false JIT defect predictions CP can filter out. We experiment with two state-of-the-art JIT defect prediction techniques (DeepJIT and CC2Vec) and two widely used datasets (Qt and OpenStack).
Our experiments show that CP can ensure correctness with a 95% probability, for only 27% (for DeepJIT) and 9% (for CC2Vec) of the JIT defect predictions. Additionally, our experiments indicate that CP might be a valuable technique for filtering out the false predictions of JIT defect predictors. CP can filter out up to 100% of false negative predictions and 90% of false positives generated by CC2Vec, and up to 86% of false negative predictions and 83% of false positives generated by DeepJIT.
Mon 15 AprDisplayed time zone: Lisbon change
11:00 - 12:30 | Defects, Bugs and IssuesTechnical Papers / MSR Awards / Social Events / Tutorials / Data and Tool Showcase Track / Mining Challenge / Registered Reports / Industry Track / MIP Award / Vision and Reflection / Keynotes at Grande Auditório Chair(s): Wesley Assunção North Carolina State University | ||
11:00 12mTalk | Enhancing Performance Bug Prediction Using Performance Code Metrics Technical Papers Guoliang Zhao Computer Science of Queen's University, Stefanos Georgio , Safwat Hassan University of Toronto, Canada, Ying Zou Queen's University, Kingston, Ontario, Derek Truong IBM Canada, Toby Corbin IBM UK | ||
11:12 12mTalk | CrashJS: A NodeJS Benchmark for Automated Crash Reproduction Technical Papers Philip Oliver Victoria University of Wellington, Jens Dietrich Victoria University of Wellington, Craig Anslow Victoria University of Wellington, Michael Homer Victoria University of Wellington | ||
11:24 12mTalk | An Empirical Study on Just-in-time Conformal Defect Prediction Technical Papers Xhulja Shahini paluno - University of Duisburg-Essen, Andreas Metzger University of Duisburg-Essen, Klaus Pohl | ||
11:36 12mTalk | Fine-Grained Just-In-Time Defect Prediction at the Block Level in Infrastructure-as-Code (IaC) Technical Papers Mahi Begoug , Moataz Chouchen ETS, Ali Ouni ETS Montreal, University of Quebec, Eman Abdullah AlOmar Stevens Institute of Technology, Mohamed Wiem Mkaouer University of Michigan - Flint | ||
11:48 4mTalk | TrickyBugs: A Dataset of Corner-case Bugs in Plausible Programs Data and Tool Showcase Track Kaibo Liu Peking University, Yudong Han Peking University, Yiyang Liu Peking University, Zhenpeng Chen Nanyang Technological University, Jie M. Zhang King's College London, Federica Sarro University College London, Gang Huang Peking University, Yun Ma Peking University | ||
11:52 4mTalk | GitBugs-Java: A Reproducible Java Benchmark of Recent Bugs Data and Tool Showcase Track André Silva KTH Royal Institute of Technology, Nuno Saavedra INESC-ID and IST, University of Lisbon, Martin Monperrus KTH Royal Institute of Technology | ||
11:56 4mTalk | A Dataset of Partial Program Fixes Data and Tool Showcase Track Dirk Beyer LMU Munich, Lars Grunske Humboldt-Universität zu Berlin, Matthias Kettl LMU Munich, Marian Lingsch-Rosenfeld LMU Munich, Moeketsi Raselimo Humboldt-Universität zu Berlin | ||
12:00 4mTalk | BugsPHP: A dataset for Automated Program Repair in PHP Data and Tool Showcase Track K.D. Pramod University of Moratuwa, Sri Lanka, W.T.N. De Silva University of Moratuwa, Sri Lanka, W.U.K. Thabrew University of Moratuwa, Sri Lanka, Ridwan Salihin Shariffdeen National University of Singapore, Sandareka Wickramanayake University of Moratuwa, Sri Lanka Pre-print | ||
12:04 4mTalk | AW4C: A Commit-Aware C Dataset for Actionable Warning Identification Data and Tool Showcase Track Zhipeng Liu , Meng Yan Chongqing University, Zhipeng Gao Shanghai Institute for Advanced Study - Zhejiang University, Dong Li , Xiaohong Zhang Chongqing University, Dan Yang Chongqing University | ||
12:08 5mTalk | Predicting the Impact of Crashes Across Release Channels Industry Track | ||
12:13 5mTalk | Zero Shot Learning based Alternatives for Class Imbalanced Learning Problem in Enterprise Software Defect Analysis Industry Track |