MSR 2024
Mon 15 - Tue 16 April 2024 Lisbon, Portugal
co-located with ICSE 2024

Static call graph (CG) construction often over-approximates call relations, leading to sound, but imprecise results. Recent research has explored machine learning (ML)-based CG pruning as a means to enhance precision by eliminating false edges. However, current methods suffer from a limited evaluation dataset, imbalanced training data, and reduced recall, which affects practical downstream analyses. Prior results were also not compared with advanced static CG construction techniques yet. This study tackles these issues. We introduce the NYXCorpus, a dataset of real-world Java programs with high test coverage and we collect traces from test executions and build a ground truth of dynamic CGs. We leverage these CGs to explore conservative pruning strategies during the training and inference of ML-based CG pruners. The study compares 0-CFA-based static CGs with a context-sensitive 1-CFA algorithm, both with and without pruning. We find that CG pruning is a difficult task for real-world Java projects and substantial improvements in the CG precision (+25%) meet reduced recall (-9%). However, our experiments show promising results: even when we favor recall over precision by using an F2 metric in our experiments, we can show that pruned CGs have comparable quality to a context-sensitive 1-CFA analysis while being computationally less demanding. Resulting CGs are much smaller (69%), and substantially faster (3.5x speed-up), with virtually unchanged results in our downstream analysis.

Mon 15 Apr

Displayed time zone: Lisbon change

16:00 - 17:30
Machine learning for Software EngineeringTechnical Papers at Grande Auditório
Chair(s): Diego Costa Concordia University, Canada
16:00
12m
Talk
Whodunit: Classifying Code as Human Authored or GPT-4 Generated - A case study on CodeChef problems
Technical Papers
Oseremen Joy Idialu University of Waterloo, Noble Saji Mathews University of Waterloo, Canada, Rungroj Maipradit University of Waterloo, Joanne M. Atlee University of Waterloo, Mei Nagappan University of Waterloo
DOI Pre-print
16:12
12m
Talk
GIRT-Model: Automated Generation of Issue Report Templates
Technical Papers
Nafiseh Nikehgbal Sharif University of Technology, Amir Hossein Kargaran LMU Munich, Abbas Heydarnoori Bowling Green State University
DOI Pre-print
16:24
12m
Talk
MicroRec: Leveraging Large Language Models for Microservice Recommendation
Technical Papers
Ahmed Saeed Alsayed University of Wollongong, Hoa Khanh Dam University of Wollongong, Chau Nguyen University of Wollongong
16:36
12m
Talk
PeaTMOSS: A Dataset and Initial Analysis of Pre-Trained Models in Open-Source Software
Technical Papers
Wenxin Jiang Purdue University, Jerin Yasmin Queen's University, Canada, Jason Jones Purdue University, Nicholas Synovic Loyola University Chicago, Jiashen Kuo Purdue University, Nathaniel Bielanski Purdue University, Yuan Tian Queen's University, Kingston, Ontario, George K. Thiruvathukal Loyola University Chicago and Argonne National Laboratory, James C. Davis Purdue University
DOI Pre-print
16:48
12m
Talk
Data Augmentation for Supervised Code Translation Learning
Technical Papers
Binger Chen Technische Universität Berlin, Jacek golebiowski Amazon AWS, Ziawasch Abedjan Leibniz Universität Hannover
17:00
12m
Talk
On the Effectiveness of Machine Learning-based Call-Graph Pruning: An Empirical Study
Technical Papers
Amir Mir Delft University of Technology, Mehdi Keshani Delft University of Technology, Sebastian Proksch Delft University of Technology
Pre-print
17:12
12m
Talk
Leveraging GPT-like LLMs to Automate Issue Labeling
Technical Papers
Giuseppe Colavito University of Bari, Italy, Filippo Lanubile University of Bari, Nicole Novielli University of Bari, Luigi Quaranta University of Bari, Italy
Pre-print