MSR 2024
Mon 15 - Tue 16 April 2024 Lisbon, Portugal
co-located with ICSE 2024

Data-driven program translation has been recently the focus of several lines of research. A common and robust strategy is supervised learning. However, there is typically a lack of parallel training data, i.e., pairs of code snippets in source and target language. While many data augmentation techniques exist in the domain of natural language processing, they cannot be easily adapted to tackle code translation due to the unique restrictions of programming languages. In this paper, we develop a novel rule-based augmentation approach tailored for code translation data, and a novel retrieval-based approach that combines code samples from unorganized big code repositories to obtain new training data. Both approaches are language-independent. We perform an extensive empirical evaluation on existing benchmarks showing that our method improves the accuracy of state-of-the-art supervised translation techniques by up to 35%.

Mon 15 Apr

Displayed time zone: Lisbon change

16:00 - 17:30
Machine learning for Software EngineeringTechnical Papers at Grande Auditório
Chair(s): Diego Costa Concordia University, Canada
16:00
12m
Talk
Whodunit: Classifying Code as Human Authored or GPT-4 Generated - A case study on CodeChef problems
Technical Papers
Oseremen Joy Idialu University of Waterloo, Noble Saji Mathews University of Waterloo, Canada, Rungroj Maipradit University of Waterloo, Joanne M. Atlee University of Waterloo, Mei Nagappan University of Waterloo
DOI Pre-print
16:12
12m
Talk
GIRT-Model: Automated Generation of Issue Report Templates
Technical Papers
Nafiseh Nikehgbal Sharif University of Technology, Amir Hossein Kargaran LMU Munich, Abbas Heydarnoori Bowling Green State University
DOI Pre-print
16:24
12m
Talk
MicroRec: Leveraging Large Language Models for Microservice Recommendation
Technical Papers
Ahmed Saeed Alsayed University of Wollongong, Hoa Khanh Dam University of Wollongong, Chau Nguyen University of Wollongong
16:36
12m
Talk
PeaTMOSS: A Dataset and Initial Analysis of Pre-Trained Models in Open-Source Software
Technical Papers
Wenxin Jiang Purdue University, Jerin Yasmin Queen's University, Canada, Jason Jones Purdue University, Nicholas Synovic Loyola University Chicago, Jiashen Kuo Purdue University, Nathaniel Bielanski Purdue University, Yuan Tian Queen's University, Kingston, Ontario, George K. Thiruvathukal Loyola University Chicago and Argonne National Laboratory, James C. Davis Purdue University
DOI Pre-print
16:48
12m
Talk
Data Augmentation for Supervised Code Translation Learning
Technical Papers
Binger Chen Technische Universität Berlin, Jacek golebiowski Amazon AWS, Ziawasch Abedjan Leibniz Universität Hannover
17:00
12m
Talk
On the Effectiveness of Machine Learning-based Call-Graph Pruning: An Empirical Study
Technical Papers
Amir Mir Delft University of Technology, Mehdi Keshani Delft University of Technology, Sebastian Proksch Delft University of Technology
Pre-print
17:12
12m
Talk
Leveraging GPT-like LLMs to Automate Issue Labeling
Technical Papers
Giuseppe Colavito University of Bari, Italy, Filippo Lanubile University of Bari, Nicole Novielli University of Bari, Luigi Quaranta University of Bari, Italy
Pre-print