MSR 2024
Mon 15 - Tue 16 April 2024 Lisbon, Portugal
co-located with ICSE 2024
Dates
Tracks
You're viewing the program in a time zone which is different from your device's time zone change time zone

Mon 15 Apr

Displayed time zone: Lisbon change

09:00 - 10:30
Day 1: OpeningTechnical Papers / MSR Awards / Social Events / Tutorials / Data and Tool Showcase Track / Mining Challenge / Registered Reports / Industry Track / MIP Award / Vision and Reflection / Keynotes at Grande Auditório
Chair(s): Diomidis Spinellis Athens University of Economics and Business & Delft University of Technology
09:00
30m
Day opening
Opening Session & Award Announcements
MSR Awards

09:30
30m
Awards
MSR 2024 Foundational Contribution Award talk
MSR Awards
Margaret-Anne Storey University of Victoria
10:00
30m
Talk
Most Influential Paper Award talk
MIP Award
10:30 - 11:00
Coffee for MSR newcomersSocial Events at Open Space (reserved area)
Chair(s): Federica Sarro University College London, Alexander Serebrenik Eindhoven University of Technology
10:30
30m
Coffee break
Coffee for MSR newcomers
Social Events
Federica Sarro University College London, Alexander Serebrenik Eindhoven University of Technology
11:00 - 12:30
Ecosystems, Reuse and APIs & TutorialsData and Tool Showcase Track / Technical Papers / Tutorials at Almada Negreiros
Chair(s): Mahmoud Alfadel University of Waterloo, Ayushi Rastogi University of Groningen, The Netherlands
11:00
12m
Talk
Thirty-Three Years of Mathematicians and Software Engineers: A Case Study of Domain Expertise and Participation in Proof Assistant Ecosystems
Technical Papers
Gwenyth Lincroft Northeastern University, Minsung Cho Northeastern University, Mahsa Bazzaz Northeastern University, Katherine Hough Northeastern University, Jonathan Bell Northeastern University
Pre-print Media Attached
11:12
12m
Talk
Boosting API Misuse Detection via Integrating API Constraints from Multiple Sources
Technical Papers
Can Li Nanjing University of Aeronautics and Astronautics, Jingxuan Zhang Nanjing University of Aeronautics and Astronautics, Yixuan Tang Nanjing University of Aeronautics and Astronautics, Zhuhang Li Nanjing University of Aeronautics and Astronautics, Tianyue Sun Nanjing University of Aeronautics and Astronautics
11:24
6m
Talk
Availability and Usage of Platform-Specific APIs: A First Empirical Study
Technical Papers
Pre-print Media Attached File Attached
11:30
4m
Talk
AndroLibZoo: A Reliable Dataset of Libraries Based on Software Dependency Analysis
Data and Tool Showcase Track
Jordan Samhi CISPA Helmholtz Center for Information Security, Tegawendé F. Bissyandé University of Luxembourg, Jacques Klein University of Luxembourg
11:34
4m
Talk
Goblin: A Framework for Enriching and Querying the Maven Central Dependency Graph
Data and Tool Showcase Track
Damien Jaime Sorbonne Université - Lip6 - SAP, Joyce El Haddad Paris Dauphine-PSL Université, CNRS, LAMSADE, Pascal Poizat Université Paris Nanterre & LIP6
Pre-print File Attached
11:38
4m
Talk
Dataset: Copy-based Reuse in Open Source Software
Data and Tool Showcase Track
Mahmoud Jahanshahi Research Assistant, University of Tennessee Knoxville, Audris Mockus The University of Tennessee & Vilnius University
Pre-print
11:45
45m
Talk
Mining Our Way Back to Incremental Builds for DevOps Pipelines
Tutorials
Shane McIntosh University of Waterloo
Pre-print
11:00 - 12:30
11:00
12m
Talk
Enhancing Performance Bug Prediction Using Performance Code Metrics
Technical Papers
Guoliang Zhao Computer Science of Queen's University, Stefanos Georgio , Safwat Hassan University of Toronto, Canada, Ying Zou Queen's University, Kingston, Ontario, Derek Truong IBM Canada, Toby Corbin IBM UK
11:12
12m
Talk
CrashJS: A NodeJS Benchmark for Automated Crash Reproduction
Technical Papers
Philip Oliver Victoria University of Wellington, Jens Dietrich Victoria University of Wellington, Craig Anslow Victoria University of Wellington, Michael Homer Victoria University of Wellington
11:24
12m
Talk
An Empirical Study on Just-in-time Conformal Defect Prediction
Technical Papers
Xhulja Shahini paluno - University of Duisburg-Essen, Andreas Metzger University of Duisburg-Essen, Klaus Pohl
11:36
12m
Talk
Fine-Grained Just-In-Time Defect Prediction at the Block Level in Infrastructure-as-Code (IaC)
Technical Papers
Mahi Begoug , Moataz Chouchen ETS, Ali Ouni ETS Montreal, University of Quebec, Eman Abdullah AlOmar Stevens Institute of Technology, Mohamed Wiem Mkaouer University of Michigan - Flint
11:48
4m
Talk
TrickyBugs: A Dataset of Corner-case Bugs in Plausible Programs
Data and Tool Showcase Track
Kaibo Liu Peking University, Yudong Han Peking University, Yiyang Liu Peking University, Zhenpeng Chen Nanyang Technological University, Jie M. Zhang King's College London, Federica Sarro University College London, Gang Huang Peking University, Yun Ma Peking University
11:52
4m
Talk
GitBugs-Java: A Reproducible Java Benchmark of Recent Bugs
Data and Tool Showcase Track
André Silva KTH Royal Institute of Technology, Nuno Saavedra INESC-ID and IST, University of Lisbon, Martin Monperrus KTH Royal Institute of Technology
11:56
4m
Talk
A Dataset of Partial Program Fixes
Data and Tool Showcase Track
Dirk Beyer LMU Munich, Lars Grunske Humboldt-Universität zu Berlin, Matthias Kettl LMU Munich, Marian Lingsch-Rosenfeld LMU Munich, Moeketsi Raselimo Humboldt-Universität zu Berlin
12:00
4m
Talk
BugsPHP: A dataset for Automated Program Repair in PHP
Data and Tool Showcase Track
K.D. Pramod University of Moratuwa, Sri Lanka, W.T.N. De Silva University of Moratuwa, Sri Lanka, W.U.K. Thabrew University of Moratuwa, Sri Lanka, Ridwan Salihin Shariffdeen National University of Singapore, Sandareka Wickramanayake University of Moratuwa, Sri Lanka
Pre-print
12:04
4m
Talk
AW4C: A Commit-Aware C Dataset for Actionable Warning Identification
Data and Tool Showcase Track
Zhipeng Liu , Meng Yan Chongqing University, Zhipeng Gao Shanghai Institute for Advanced Study - Zhejiang University, Dong Li , Xiaohong Zhang Chongqing University, Dan Yang Chongqing University
12:08
5m
Talk
Predicting the Impact of Crashes Across Release Channels
Industry Track
Suhaib Mujahid Mozilla, Diego Costa Concordia University, Canada, Marco Castelluccio Mozilla
12:13
5m
Talk
Zero Shot Learning based Alternatives for Class Imbalanced Learning Problem in Enterprise Software Defect Analysis
Industry Track
Sangameshwar Patil Dept. of CSE, IIT Madras and TRDDC, TCS, B Ravindran IITM
14:00 - 15:30
Mining ChallengeMining Challenge at Almada Negreiros
Chair(s): Preetha Chatterjee Drexel University, USA, Fabio Palomba University of Salerno
14:00
5m
Talk
ChatGPT Chats Decoded: Uncovering Prompt Patterns for Superior Solutions in Software Development Lifecycle
Mining Challenge
Liangxuan Wu Huazhong University of Science and Technology, Yanjie Zhao Huazhong University of Science and Technology, Xinyi Hou Huazhong University of Science and Technology, Tianming Liu Monash Univerisity, Haoyu Wang Huazhong University of Science and Technology
14:05
5m
Talk
Write me this Code: An Analysis of ChatGPT Quality for Producing Source Code
Mining Challenge
Konstantinos Moratis Electrical and Computer Engineering Dept., Aristotle University of Thessaloniki, Themistoklis Diamantopoulos Electrical and Computer Engineering Dept, Aristotle University of Thessaloniki, Dimitrios-Nikitas Nastos Electrical and Computer Engineering Dept., Aristotle University of Thessaloniki, Andreas Symeonidis Aristotle University of Thessaloniki
Pre-print
14:10
5m
Talk
Quality Assessment of ChatGPT Generated Code and their Use by Developers
Mining Challenge
Mohammed Latif Siddiq University of Notre Dame, Lindsay Roney University of Notre Dame, Jiahao Zhang , Joanna C. S. Santos University of Notre Dame
Pre-print Media Attached File Attached
14:15
5m
Talk
Analyzing Developer Use of ChatGPT Generated Code in Open Source GitHub Projects
Mining Challenge
Balreet Grewal University of Alberta, Wentao Lu University of Alberta, Sarah Nadi New York University Abu Dhabi, University of Alberta, Cor-Paul Bezemer University of Alberta
Pre-print
14:20
5m
Talk
How I Learned to Stop Worrying and Love ChatGPT
Mining Challenge
Piotr Przymus Nicolaus Copernicus University in Toruń, Poland, Mikołaj Fejzer Nicolaus Copernicus University in Toruń, Jakub Narębski Nicolaus Copernicus University in Toruń, Krzysztof Stencel University of Warsaw
Pre-print
14:25
5m
Talk
Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation.
Mining Challenge
Kailun Jin York University, Chung-Yu Wang York University, Hung Viet Pham York University, Hadi Hemmati York University
Pre-print
14:30
5m
Talk
The role of library versions in Developer-ChatGPT conversations
Mining Challenge
Rachna Raj Concordia University, Diego Costa Concordia University, Canada
Pre-print
14:35
5m
Talk
AI Writes, We Analyze: The ChatGPT Python Code Saga
Mining Challenge
Md Fazle Rabbi Idaho State University, Arifa Islam Champa Idaho State University, Minhaz F. Zibran Idaho State University, Md Rakibul Islam Lamar University
DOI Pre-print
14:40
5m
Talk
ChatGPT in Action: Analyzing Its Use in Software Development
Mining Challenge
Arifa Islam Champa Idaho State University, Md Fazle Rabbi Idaho State University, Costain Nachuma Idaho State University, Minhaz F. Zibran Idaho State University
DOI Pre-print
14:45
5m
Talk
Chatting with AI: Deciphering Developer Conversations with ChatGPT
Mining Challenge
Suad Mohamed Belmont University, Abdullah Parvin Belmont University, Esteban Parra Belmont University
14:50
5m
Talk
Does Generative AI Generate Smells Related to Container Orchestration?: An Exploratory Study with Kubernetes Manifests
Mining Challenge
Yue Zhang Auburn University, Rachel Meredith Auburn University, Wilson Reaves Auburn University, Julia Coriolano Federal University of Pernambuco, Muhammad Ali Babar School of Computer Science, The University of Adelaide, Akond Rahman Auburn University
Pre-print
14:55
5m
Talk
On the Taxonomy of Developers' Discussion Topics with ChatGPT
Mining Challenge
Ertugrul Sagdic Lamar University, Arda Bayram Lamar University, Md Rakibul Islam Lamar University
15:00
5m
Talk
How to refactor this code? An exploratory study on developer-ChatGPT refactoring conversations
Mining Challenge
Eman Abdullah AlOmar Stevens Institute of Technology, AnushKrishna Venkatakrishnan Rochester Institute of Technology, USA, Mohamed Wiem Mkaouer University of Michigan - Flint, Christian Newman , Ali Ouni ETS Montreal, University of Quebec
15:05
5m
Talk
Analyzing Developer-ChatGPT Conversations for Software Refactoring: An Exploratory Study
Mining Challenge
Omkar Sandip Chavan Rochester Institute of Technology, Divya Dilip Hinge Rochester Institute of Technology, Soham Sanjay Deo Rochester Institute of Technology, Yaxuan (Olivia) Wang Rochester Institute of Technology, Mohamed Wiem Mkaouer University of Michigan - Flint
15:10
5m
Talk
How Do Software Developers Use ChatGPT? An Exploratory Study on GitHub Pull Requests
Mining Challenge
Moataz Chouchen ETS, Narjes Bessghaier ETS Montreal, University of Quebec, Mahi Begoug , Ali Ouni ETS Montreal, University of Quebec, Eman Abdullah AlOmar Stevens Institute of Technology, Mohamed Wiem Mkaouer University of Michigan - Flint
15:15
5m
Talk
Investigating the Utility of ChatGPT in the Issue Tracking System: An Exploratory Study
Mining Challenge
Joy Krishan Das University of Saskatchewan, Saikat Mondal University of Saskatchewan, Chanchal K. Roy University of Saskatchewan, Canada
Pre-print
15:20
5m
Talk
Enhancing User Interaction in ChatGPT: Characterizing and Consolidating Multiple Prompts for Issue Resolution
Mining Challenge
Saikat Mondal University of Saskatchewan, Suborno Deb Bappon Department of Computer Science, University of Saskatchewan, Canada, Chanchal K. Roy University of Saskatchewan, Canada
Pre-print
14:00 - 15:30
Software QualityTechnical Papers / Registered Reports / Data and Tool Showcase Track at Grande Auditório
Chair(s): Gopi Krishnan Rajbahadur Centre for Software Excellence, Huawei, Canada
14:00
12m
Talk
Not all Dockerfile Smells are the Same: An Empirical Evaluation of Hadolint Writing Practices by Experts
Technical Papers
Giovanni Rosa University of Molise, Simone Scalabrino University of Molise, Gregorio Robles Universidad Rey Juan Carlos, Rocco Oliveto University of Molise
14:12
12m
Talk
Supporting High-Level to Low-Level Requirements Coverage Reviewing with Large Language Models
Technical Papers
Anamaria-Roberta Hartl Johannes Kepler University Linz, Christoph Mayr-Dorn JOHANNES KEPLER UNIVERSITY LINZ, Atif Mashkoor Johannes Kepler University Linz, Alexander Egyed Johannes Kepler University Linz
DOI Authorizer link Pre-print
14:24
12m
Talk
On the Executability of R Markdown Files
Technical Papers
Md Anaytul Islam Lakehead University, Muhammad Asaduzzman University of Windsor, Shaowei Wang Department of Computer Science, University of Manitoba, Canada
14:36
12m
Talk
APIstic: A Large Collection of OpenAPI Metrics
Technical Papers
souhaila serbout Software Institute @ USI, Cesare Pautasso Software Institute, Faculty of Informatics, USI Lugano
14:48
6m
Talk
Improving Automated Code Reviews: Learning From Experience
Technical Papers
Hong Yi Lin The University of Melbourne, Patanamon Thongtanunam University of Melbourne, Christoph Treude Singapore Management University, Wachiraphan (Ping) Charoenwet The University of Melbourne
14:55
4m
Talk
Multi-faceted Code Smell Detection at Scale using DesigniteJava 2.0
Data and Tool Showcase Track
Tushar Sharma Dalhousie University
Pre-print
14:59
4m
Talk
SATDAUG - A Balanced and Augmented Dataset for Detecting Self-Admitted Technical Debt
Data and Tool Showcase Track
Edi Sutoyo Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen, Andrea Capiluppi University of Groningen
15:03
4m
Talk
Curated Email-Based Code Reviews Datasets
Data and Tool Showcase Track
Mingzhao Liang The University of Melbourne, Wachiraphan (Ping) Charoenwet The University of Melbourne, Patanamon Thongtanunam University of Melbourne
15:07
4m
Talk
TestDossier: A Dataset of Tested Values Automatically Extracted from Test Execution
Data and Tool Showcase Track
Pre-print Media Attached
15:11
4m
Talk
Greenlight: Highlighting TensorFlow APIs Energy Footprint
Data and Tool Showcase Track
Saurabhsingh Rajput Dalhousie University, Maria Kechagia University College London, Federica Sarro University College London, Tushar Sharma Dalhousie University
Pre-print
15:15
5m
Talk
When Code Smells Meet ML: On the Lifecycle of ML-specific Code Smells in ML-enabled Systems
Registered Reports
Gilberto Recupito University of Salerno, Giammaria Giordano University of Salerno, Filomena Ferrucci University of Salerno, Dario Di Nucci University of Salerno, Fabio Palomba University of Salerno
15:20
5m
Talk
Comparison of Static Analysis Architecture Recovery Tools for Microservice Applications
Registered Reports
Simon Schneider Hamburg University of Technology, Alexander Bakhtin University of Oulu, Xiaozhou Li University of Oulu, Jacopo Soldani University of Pisa, Italy, Antonio Brogi Università di Pisa, Tomas Cerny University of Arizona, Riccardo Scandariato Hamburg University of Technology, Davide Taibi University of Oulu and Tampere University
16:00 - 17:30
Mobile AppsData and Tool Showcase Track / Technical Papers at Almada Negreiros
Chair(s): Dario Di Nucci University of Salerno
16:00
12m
Talk
Automating GUI-based Test Oracles for Mobile Apps
Technical Papers
Kesina Baral CQSE America, Jack Johnson , Junayed Mahmud George Mason University, Sabiha Salma George Mason University, Mattia Fazzini University of Minnesota, Julia Rubin University of British Columbia, Jeff Offutt George Mason University, Kevin Moran University of Central Florida
16:12
12m
Talk
Global Prosperity or Local Monopoly? Understanding the Geography of App Popularity
Technical Papers
Liu Wang Beijing University of Posts and Telecommunications, Conghui Zheng Beijing University of Posts and Telecommunications, Haoyu Wang Huazhong University of Science and Technology, Xiapu Luo The Hong Kong Polytechnic University, Gareth Tyson Queen Mary University of London, Yi Wang , Shangguang Wang Beijing University of Posts and Telecommunications
16:24
12m
Talk
GuiEvo: Automated Evolution of Mobile App UIs
Technical Papers
Sabiha Salma George Mason University, S M Hasan Mansur George Mason University, Yule Zhang George Mason University, Kevin Moran University of Central Florida
16:36
12m
Talk
Comparing Apples to Androids: Discovery, Retrieval, and Matching of iOS and Android Apps for Cross-Platform Analyses
Technical Papers
Magdalena Steinböck TU Wien, Jakob Bleier TU Wien, Mikka Rainer CISPA Helmholtz Center for Information Security, Tobias Urban Institute for Internet Security & secunet Security Networks AG, Christine Utz CISPA Helmholtz Center for Information Security, Martina Lindorfer TU Wien
16:48
12m
Talk
Keep Me Updated: An Empirical Study on Embedded Javascript Engines in Android Apps
Technical Papers
Elliott Wen The University of Auckland, Jiaxiang Liu The Hong Kong Polytechnic University, Xiapu Luo The Hong Kong Polytechnic University, Giovanni Russello University of Auckland, Jens Dietrich Victoria University of Wellington
17:00
12m
Talk
Large Language Model vs. Stack Overflow in Addressing Android Permission Related Challenges
Technical Papers
Sahrima Jannat Oishwee University of Saskatchewan, Natalia Stakhanova University of Saskatchewan, Zadia Codabux University of Saskatchewan, Canada
17:12
4m
Talk
DATAR: A Dataset for Tracking App Releases
Data and Tool Showcase Track
Yasaman Abedini Sharif University of Technology, Mohammad Hadi Hajihosseini Sharif University of Technology, Abbas Heydarnoori Bowling Green State University
17:16
4m
Talk
AndroZoo: A Retrospective with a Glimpse into the Future
Data and Tool Showcase Track
Marco Alecci University of Luxembourg, Pedro Jesús Ruiz Jiménez University of Luxembourg, Kevin Allix Independent Researcher, Tegawendé F. Bissyandé University of Luxembourg, Jacques Klein University of Luxembourg
16:00 - 17:30
Machine learning for Software EngineeringTechnical Papers at Grande Auditório
Chair(s): Diego Costa Concordia University, Canada
16:00
12m
Talk
Whodunit: Classifying Code as Human Authored or GPT-4 Generated - A case study on CodeChef problems
Technical Papers
Oseremen Joy Idialu University of Waterloo, Noble Saji Mathews University of Waterloo, Canada, Rungroj Maipradit University of Waterloo, Joanne M. Atlee University of Waterloo, Mei Nagappan University of Waterloo
DOI Pre-print
16:12
12m
Talk
GIRT-Model: Automated Generation of Issue Report Templates
Technical Papers
Nafiseh Nikehgbal Sharif University of Technology, Amir Hossein Kargaran LMU Munich, Abbas Heydarnoori Bowling Green State University
DOI Pre-print
16:24
12m
Talk
MicroRec: Leveraging Large Language Models for Microservice Recommendation
Technical Papers
Ahmed Saeed Alsayed University of Wollongong, Hoa Khanh Dam University of Wollongong, Chau Nguyen University of Wollongong
16:36
12m
Talk
PeaTMOSS: A Dataset and Initial Analysis of Pre-Trained Models in Open-Source Software
Technical Papers
Wenxin Jiang Purdue University, Jerin Yasmin Queen's University, Canada, Jason Jones Purdue University, Nicholas Synovic Loyola University Chicago, Jiashen Kuo Purdue University, Nathaniel Bielanski Purdue University, Yuan Tian Queen's University, Kingston, Ontario, George K. Thiruvathukal Loyola University Chicago and Argonne National Laboratory, James C. Davis Purdue University
DOI Pre-print
16:48
12m
Talk
Data Augmentation for Supervised Code Translation Learning
Technical Papers
Binger Chen Technische Universität Berlin, Jacek golebiowski Amazon AWS, Ziawasch Abedjan Leibniz Universität Hannover
17:00
12m
Talk
On the Effectiveness of Machine Learning-based Call-Graph Pruning: An Empirical Study
Technical Papers
Amir Mir Delft University of Technology, Mehdi Keshani Delft University of Technology, Sebastian Proksch Delft University of Technology
Pre-print
17:12
12m
Talk
Leveraging GPT-like LLMs to Automate Issue Labeling
Technical Papers
Giuseppe Colavito University of Bari, Italy, Filippo Lanubile University of Bari, Nicole Novielli University of Bari, Luigi Quaranta University of Bari, Italy
Pre-print

Tue 16 Apr

Displayed time zone: Lisbon change

09:00 - 10:30
Development: practices and humans Data and Tool Showcase Track / Technical Papers at Almada Negreiros
Chair(s): Gema Rodríguez-Pérez University of British Columbia (UBC)
09:50
6m
Talk
Exploring the Effect of Multiple Natural Languages on Code Suggestion Using GitHub Copilot
Technical Papers
Kei Koyanagi Kyushu University, Dong Wang Kyushu University, Japan, Kotaro Noguchi Kyushu University, Masanari Kondo Kyushu University, Alexander Serebrenik Eindhoven University of Technology, Yasutaka Kamei Kyushu University, Naoyasu Ubayashi Kyushu University
Pre-print
09:56
4m
Talk
A Four-Dimension Gold Standard Dataset for Opinion Mining in Software Engineering
Data and Tool Showcase Track
Md Rakibul Islam Lamar University, Md Fazle Rabbi Idaho State University, Jo Youngeun Lamar University, Arifa Islam Champa Idaho State University, Ethan J Young Lamar University, Camden M Wilson Lamar University, Gavin J Scott Lamar University, Minhaz F. Zibran Idaho State University
10:00
4m
Talk
Opening the Valve on Pure-Data: Usage Patterns and Programming Practices of a Data-Flow Based Visual Programming Language
Data and Tool Showcase Track
Anisha Islam Department of Computing Science, University of Alberta, Kalvin Eng University of Alberta, Abram Hindle University of Alberta
10:04
4m
Talk
The PIPr Dataset of Public Infrastructure as Code Programs
Data and Tool Showcase Track
Daniel Sokolowski University of St. Gallen, David Spielmann University of St. Gallen, Guido Salvaneschi University of St. Gallen
Link to publication DOI Pre-print
10:08
4m
Talk
A Dataset of Microservices-based Open-Source Projects
Data and Tool Showcase Track
Dario Amoroso d'Aragona Tampere University, Alexander Bakhtin University of Oulu, Xiaozhou Li University of Oulu, Ruoyu Su University of Oulu, Lauren Adams Baylor University, Ernesto Aponte Universidad del Sagrado Corazón, Francis Boyle Baylor University, Patrick Boyle Baylor University, Rachel Koerner Baylor University, Joseph Lee University of Richmond, Fangchao Tian University of Oulu, Yuqing Wang University of Oulu, Jesse Nyyssölä University of Helsinki, Ernesto Quevedo Baylor University, Shahidur Md Rahaman Baylor University, Amr Elsayed Baylor University, Mika Mäntylä University of Helsinki and University of Oulu, Tomas Cerny University of Arizona, Davide Taibi University of Oulu and Tampere University
10:12
4m
Talk
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
Data and Tool Showcase Track
Christian Birchler Zurich University of Applied Sciences & University of Bern, Cyrill Rohrbach University of Bern, Switzerland, Timo Kehrer University of Bern, Sebastiano Panichella Zurich University of Applied Sciences
10:16
4m
Talk
Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads
Data and Tool Showcase Track
Ramtin Ehsani Drexel University, Mia Mohammad Imran Virginia Commonwealth University, Robert Zita Elmhurst University, Kostadin Damevski Virginia Commonwealth University, Preetha Chatterjee Drexel University, USA
10:20
4m
Talk
A Dataset of Atoms of Confusion in the Android Open Source Project
Data and Tool Showcase Track
Davi Batista Tabosa Federal University of Ceará, Oton Pinheiro Federal University of Ceará, Lincoln Rocha Federal University of Ceará, Windson Viana Federal University of Ceará
10:24
4m
Talk
PlayMyData: a curated dataset of multi-platform video games
Data and Tool Showcase Track
Andrea D'Angelo University of L'Aquila, Claudio Di Sipio University of L'Aquila, Cristiano Politowski DIRO, University of Montreal, Riccardo Rubei University of L'Aquila
09:00 - 10:30
Keynote and TutorialTutorials / Keynotes at Grande Auditório
Chair(s): Romain Robbes
09:00
45m
Keynote
Questioning the questions we ask about the impact of AI on software engineering
Keynotes
Margaret-Anne Storey University of Victoria
09:45
45m
Talk
Open Source Software Digital Sociology: Quantifying and Managing Complex Open Source Software Ecosystem
Tutorials
Minghui Zhou Peking University, Yuxia Zhang Beijing Institute of Technology, Xin Tan Beihang University
11:00 - 12:30
Process automation & DevOps and Tutorial ITechnical Papers / Tutorials at Almada Negreiros
Chair(s): Tom Mens University of Mons, Ayushi Rastogi University of Groningen, The Netherlands
11:00
12m
Talk
Learning to Predict and Improve Build Successes in Package Ecosystems
Technical Papers
Harshitha Menon Lawrence Livermore National Lab, Daniel Nichols University of Maryland, College Park, Abhinav Bhatele University of Maryland, College Park, Todd Gamblin Lawrence Livermore National Laboratory
11:12
12m
Talk
The Impact of Code Ownership of DevOps Artefacts on the Outcome of DevOps CI Builds
Technical Papers
Ajiromola Kola-Olawuyi University of Waterloo, Nimmi Rashinika Weeraddana University of Waterloo, Mei Nagappan University of Waterloo
11:24
12m
Talk
A Mutation-Guided Assessment of Acceleration Approaches for Continuous Integration: An Empirical Study of YourBase
Technical Papers
Zhili Zeng University of Waterloo, Tao Xiao Nara Institute of Science and Technology, Maxime Lamothe Polytechnique Montreal, Hideaki Hata Shinshu University, Shane McIntosh University of Waterloo
Pre-print
11:45
45m
Talk
Cohort Studies for Mining Software Repositories
Tutorials
Nyyti Saarimäki Tampere University, Sira Vegas Universidad Politecnica de Madrid, Valentina Lenarduzzi University of Oulu, Davide Taibi University of Oulu and Tampere University , Mikel Robredo University of Oulu
11:00 - 12:30
Software Evolution & AnalysisTechnical Papers / Data and Tool Showcase Track / Industry Track at Grande Auditório
Chair(s): Vladimir Kovalenko JetBrains Research
11:00
12m
Talk
Unveiling ChatGPT's Usage in Open Source Projects: A Mining-based Study
Technical Papers
Rosalia Tufano Università della Svizzera Italiana, Antonio Mastropaolo Università della Svizzera italiana, Federica Pepe University of Sannio, Ozren Dabic Software Institute, Università della Svizzera italiana (USI), Switzerland, Massimiliano Di Penta University of Sannio, Italy, Gabriele Bavota Software Institute @ Università della Svizzera Italiana
11:12
12m
Talk
DRMiner: A Tool For Identifying And Analyzing Refactorings In Dockerfile
Technical Papers
Emna Ksontini University of Michigan - Dearborn, Aycha Abid Oakland University, Rania Khalsi University of Michigan - Flint, Marouane Kessentini University of Michigan - Flint
11:24
12m
Talk
A Large-Scale Empirical Study of Open Source License Usage: Practices and Challenges
Technical Papers
Jiaqi Wu Zhejiang University, Lingfeng Bao Zhejiang University, Xiaohu Yang Zhejiang University, Xin Xia Huawei Technologies, Xing Hu Zhejiang University
11:36
12m
Talk
Analyzing the Evolution and Maintenance of ML Models on Hugging Face
Technical Papers
Joel Castaño Fernández Universitat Politècnica de Catalunya, Silverio Martínez-Fernández UPC-BarcelonaTech, Xavier Franch Universitat Politècnica de Catalunya, Justus Bogner Vrije Universiteit Amsterdam
Link to publication Pre-print
11:48
12m
Talk
On the Anatomy of Real-World R Code for Static Analysis
Technical Papers
Florian Sihler Ulm University, Lukas Pietzschmann Ulm University, Raphael Straub Ulm University, Matthias Tichy Ulm University, Germany, Andor Diera Ulm University, Abdelhalim Dahou GESIS Leibniz Institute for the Social Sciences
Pre-print File Attached
12:00
6m
Talk
Encoding Version History Context for Better Code Representation
Technical Papers
Huy Nguyen The University of Melbourne, Christoph Treude Singapore Management University, Patanamon Thongtanunam University of Melbourne
Pre-print
12:06
4m
Talk
CodeLL: A Lifelong Learning Dataset to Support the Co-Evolution of Data and Language Models of Code
Data and Tool Showcase Track
Martin Weyssow DIRO, Université de Montréal, Claudio Di Sipio University of L'Aquila, Davide Di Ruscio University of L'Aquila, Houari Sahraoui DIRO, Université de Montréal
12:10
4m
Talk
Bidirectional Paper-Repository Tracing in Software Engineering
Data and Tool Showcase Track
Daniel Garijo , Miguel Arroyo Universidad Politécnica de Madrid, Esteban González Guardia Universidad Politécnica de Madrid, Christoph Treude Singapore Management University, Nicola Tarocco CERN
12:14
4m
Talk
DistilKaggle: A Distilled Dataset of Kaggle Jupyter Notebooks
Data and Tool Showcase Track
Mojtaba Mostafavi Department of Computer Engineering of Sharif University of Technology, Arash Asgari Department of Computer Engineering of Sharif University of Technology, Mohammad Abolnejadian Department of Computer Engineering of Sharif University of Technology, Abbas Heydarnoori Bowling Green State University
12:18
5m
Talk
Estimating Usage of Open Source Projects
Industry Track
Sophia Vargas Google LLC, Georg Link Bitergia, JaYoung Lee Google
14:00 - 15:30
Process automation & DevOps IITechnical Papers / Data and Tool Showcase Track at Almada Negreiros
Chair(s): Shane McIntosh University of Waterloo
14:00
12m
Talk
Options Matter: Documenting and Fixing Non-Reproducible Builds in Highly-Configurable Systems
Technical Papers
Georges Aaron RANDRIANAINA Université de Rennes 1, IRISA, Djamel Eddine Khelladi CNRS, IRISA, University of Rennes, Olivier Zendra Inria, Mathieu Acher University of Rennes, France / Inria, France / CNRS, France / IRISA, France
14:12
12m
Talk
How do Machine Learning Projects use Continuous Integration Practices? An Empirical Study on GitHub Actions
Technical Papers
João Helis Bernardo Federal Institute of Education, Science and Technology of Rio Grande do Norte, Daniel Alencar Da Costa University of Otago, Sergio Queiroz de Medeiros Universidade Federal do Rio Grande do Norte, Uirá Kulesza Federal University of Rio Grande do Norte
DOI Pre-print
14:24
4m
Talk
A dataset of GitHub Actions workflow histories
Data and Tool Showcase Track
Guillaume Cardoen University of Mons, Tom Mens University of Mons, Alexandre Decan University of Mons; F.R.S.-FNRS
14:28
4m
Talk
gawd: A Differencing Tool for GitHub Actions Workflows
Data and Tool Showcase Track
Pooya Rostami Mazrae University of Mons, Alexandre Decan University of Mons; F.R.S.-FNRS, Tom Mens University of Mons
14:32
4m
Talk
RABBIT: A tool for identifying bot accounts based on their recent GitHub event history
Data and Tool Showcase Track
Natarajan Chidambaram University of Mons, Tom Mens University of Mons, Alexandre Decan University of Mons; F.R.S.-FNRS
14:36
12m
Talk
An Investigation of Patch Porting Practices of the Linux Kernel Ecosystem
Technical Papers
Xingyu Li UC Riverside, Zheng Zhang UC Riverside, Zhiyun Qian University of California at Riverside, USA, Trent Jaeger UC Riverside, Chengyu Song University of California at Riverside, USA
14:48
4m
Talk
BugsPHP: A dataset for Automated Program Repair in PHP
Data and Tool Showcase Track
K.D. Pramod University of Moratuwa, Sri Lanka, W.T.N. De Silva University of Moratuwa, Sri Lanka, W.U.K. Thabrew University of Moratuwa, Sri Lanka, Ridwan Salihin Shariffdeen National University of Singapore, Sandareka Wickramanayake University of Moratuwa, Sri Lanka
Pre-print
14:00 - 15:30
Security and Vision & ReflectionData and Tool Showcase Track / Technical Papers / Registered Reports / Vision and Reflection at Grande Auditório
Chair(s): Tim Menzies North Carolina State University
14:00
12m
Talk
Quantifying Security Issues in Reusable JavaScript Actions in GitHub Workflows
Technical Papers
Hassan Onsori Delicheh University of Mons, Belgium, Alexandre Decan University of Mons; F.R.S.-FNRS, Tom Mens University of Mons
Pre-print
14:12
12m
Talk
What Can Self-Admitted Technical Debt Tell Us About Security? A Mixed-Methods Study
Technical Papers
Nicolás E. Díaz Ferreyra Hamburg University of Technology, Mojtaba Shahin RMIT University, Mansooreh Zahedi The Univeristy of Melbourne, Sodiq Quadri Hamburg University of Technology, Riccardo Scandariato Hamburg University of Technology
Pre-print
14:24
12m
Talk
Are Latent Vulnerabilities Hidden Gems for Software Vulnerability Prediction? An Empirical Study
Technical Papers
Triet Le The University of Adelaide, Xiaoning Du Monash University, Australia, Muhammad Ali Babar School of Computer Science, The University of Adelaide
14:36
4m
Talk
MalwareBench: Malware samples are not enough
Data and Tool Showcase Track
Nusrat Zahan North Carolina State University, Philipp Burckhardt Socket, Inc, Mikola Lysenko Socket, Inc, Feross Aboukhadijeh Socket, Inc, Laurie Williams North Carolina State University
14:40
4m
Talk
Hash4Patch: A Lightweight Low False Positive Tool for Finding Vulnerability Patch Commits
Data and Tool Showcase Track
Simone Scalco University of Trento, Ranindya Paramitha University of Trento
14:44
4m
Talk
MegaVul: A C/C++ Vulnerability Dataset with Comprehensive Code Representations
Data and Tool Showcase Track
Chao Ni School of Software Technology, Zhejiang University, Liyu Shen Zhejiang University, Xiaohu Yang Zhejiang University, Yan Zhu Zhejiang University, Shaohua Wang Central University of Finance and Economics
Pre-print
14:48
5m
Talk
Analyzing and Mitigating (with LLMs) the Security Misconfigurations of Helm Charts from Artifact Hub
Registered Reports
Francesco Minna Vrije Universiteit Amsterdam, Fabio Massacci University of Trento; Vrije Universiteit Amsterdam, Katja Tuma Vrije Universiteit Amsterdam
14:53
5m
Talk
Fixing Smart Contract Vulnerabilities: A Comparative Analysis of Literature and Developer's Practices
Registered Reports
Francesco Salzano University of Molise, Simone Scalabrino University of Molise, Rocco Oliveto University of Molise, Remo Pareschi University of Molise
15:00
30m
Talk
Then, Now, and Next: Constants in Changing MSR Research Landscape
Vision and Reflection
Ayushi Rastogi University of Groningen, The Netherlands
16:00 - 17:30
Day 2: ClosingMSR Awards / Vision and Reflection at Grande Auditório
Chair(s): Alberto Bacchelli University of Zurich
16:00
30m
Talk
MSR in the age of LLMs
Vision and Reflection
Christoph Treude Singapore Management University
16:30
30m
Talk
Idealists and Pragmatists—An Only Somewhat Self-Indulgent Reflection on the Development of an MSR Paper (and Researcher)
Vision and Reflection
Shane McIntosh University of Waterloo
17:00
30m
Day closing
Closing session
MSR Awards
Diomidis Spinellis Athens University of Economics and Business & Delft University of Technology, Olga Baysal

Accepted Papers

Title
A Dataset of Atoms of Confusion in the Android Open Source Project
Data and Tool Showcase Track
A dataset of GitHub Actions workflow histories
Data and Tool Showcase Track
A Dataset of Microservices-based Open-Source Projects
Data and Tool Showcase Track
A Dataset of Partial Program Fixes
Data and Tool Showcase Track
A Four-Dimension Gold Standard Dataset for Opinion Mining in Software Engineering
Data and Tool Showcase Track
AndroLibZoo: A Reliable Dataset of Libraries Based on Software Dependency Analysis
Data and Tool Showcase Track
AndroZoo: A Retrospective with a Glimpse into the Future
Data and Tool Showcase Track
AW4C: A Commit-Aware C Dataset for Actionable Warning Identification
Data and Tool Showcase Track
Bidirectional Paper-Repository Tracing in Software Engineering
Data and Tool Showcase Track
BugsPHP: A dataset for Automated Program Repair in PHP
Data and Tool Showcase Track
Pre-print
CodeLL: A Lifelong Learning Dataset to Support the Co-Evolution of Data and Language Models of Code
Data and Tool Showcase Track
Curated Email-Based Code Reviews Datasets
Data and Tool Showcase Track
DATAR: A Dataset for Tracking App Releases
Data and Tool Showcase Track
Dataset: Copy-based Reuse in Open Source Software
Data and Tool Showcase Track
Pre-print
DistilKaggle: A Distilled Dataset of Kaggle Jupyter Notebooks
Data and Tool Showcase Track
gawd: A Differencing Tool for GitHub Actions Workflows
Data and Tool Showcase Track
GitBugs-Java: A Reproducible Java Benchmark of Recent Bugs
Data and Tool Showcase Track
Goblin: A Framework for Enriching and Querying the Maven Central Dependency Graph
Data and Tool Showcase Track
Pre-print File Attached
Greenlight: Highlighting TensorFlow APIs Energy Footprint
Data and Tool Showcase Track
Pre-print
Hash4Patch: A Lightweight Low False Positive Tool for Finding Vulnerability Patch Commits
Data and Tool Showcase Track
Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads
Data and Tool Showcase Track
MalwareBench: Malware samples are not enough
Data and Tool Showcase Track
MegaVul: A C/C++ Vulnerability Dataset with Comprehensive Code Representations
Data and Tool Showcase Track
Pre-print
Multi-faceted Code Smell Detection at Scale using DesigniteJava 2.0
Data and Tool Showcase Track
Pre-print
Opening the Valve on Pure-Data: Usage Patterns and Programming Practices of a Data-Flow Based Visual Programming Language
Data and Tool Showcase Track
PlayMyData: a curated dataset of multi-platform video games
Data and Tool Showcase Track
RABBIT: A tool for identifying bot accounts based on their recent GitHub event history
Data and Tool Showcase Track
SATDAUG - A Balanced and Augmented Dataset for Detecting Self-Admitted Technical Debt
Data and Tool Showcase Track
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
Data and Tool Showcase Track
TestDossier: A Dataset of Tested Values Automatically Extracted from Test Execution
Data and Tool Showcase Track
Pre-print Media Attached
The PIPr Dataset of Public Infrastructure as Code Programs
Data and Tool Showcase Track
Link to publication DOI Pre-print
TrickyBugs: A Dataset of Corner-case Bugs in Plausible Programs
Data and Tool Showcase Track

Call for Papers

The MSR Data and Tools Showcase Track aims to actively promote and recognize the creation of reusable datasets and tools that are designed and built not only for a specific research project but for the MSR community as a whole. These datasets and tools should enable other practitioners and researchers to jumpstart their research efforts, and also allows the reproducibility of earlier work. The MSR Data and Tools Showcase papers can be descriptions of datasets or tools built by the authors that can be used by other practitioners or researchers, and/or descriptions of the use of tools built by others to obtain specific research results.

MSR’24 Data and Tools Showcase Track will accept two types of submissions: (1) data showcase papers and (2) reusable tool showcase papers.

  1. Data showcase submissions are expected to include:
  • a description of the data source,
  • a description of the methodology used to gather the data (including provenance and the tool used to create/generate/gather the data, if any),
  • a description of the storage mechanism, including a schema if applicable,
  • if the data has been used by the authors or others, a description of how this was done including references to previously published papers,
  • a description of the originality of the dataset (that is, even if the dataset has been used in a published paper, its complete description must be unpublished) and similar existing datasets (if any),
  • ideas for future research questions that could be answered using the dataset,
  • ideas for further improvements that could be made to the dataset, and
  • any limitations and/or challenges in creating or using the dataset.
  1. Reusable Tool showcase submissions are expected to include:
  • a description of the tool, which includes the background, motivation, novelty, overall architecture, detailed design, and preliminary evaluation of the tool, as well as the link to download or access the tool,
  • a description of the design of the tool, and how to use the tool in practice,
  • clear installation instructions and example datasets that allow the reviewers to run the tool,
  • if the tool has been used by the authors or others, a description of how the tool was used, including references to previously published papers,
  • ideas for future reusability of the tool, and
  • any limitations of using the tool.

The dataset or tool should be made available at the time of submission of the paper for review but will be considered confidential until publication of the paper. The dataset or tool should include detailed instructions about how to set up the environment (e.g., requirements.txt), how to use the dataset or tool (e.g., how to import the data or how to access the data once it has been imported, how to use the tool with a running example).

At a minimum, upon publication of the paper, the authors should archive the data or tool on a persistent repository that can provide a digital object identifier (DOI) such as zenodo.org, figshare.com, Archive.org, or institutional repositories. In addition, the DOI-based citation of the dataset or the tool should be included in the camera-ready version of the paper. GitHub provides an easy way to make source code citable (with third tools and with a CITATION file).

Data and Tools showcase submissions are not: * empirical studies, or * datasets that are based on poorly explained or untrustworthy heuristics for data collection, or results of trivial application of generic tools.

If custom tools have been used to create the dataset, we expect the paper to be accompanied by the source code of the tools, along with clear documentation on how to run the tools to recreate the dataset. The tools should be open source, accompanied by an appropriate license; the source code should be citable, i.e., refer to a specific release and have a DOI. If you cannot provide the source code or the source code clause is not applicable (e.g., because the dataset consists of qualitative data), please provide a short explanation of why this is not possible.

Evaluation Criteria

The Review Criteria for the Data/Tool Showcase submissions are as follows:

  • value, usefulness, and reusability of the datasets or tools.
  • quality of the presentation.
  • clarity of relation with related work and its relevance to mining software repositories.
  • availability of the datasets or tools.

Important Dates

  • Paper Deadline: Friday 8th December 2023
  • Author Notification: Friday 12th January 2024
  • Camera Ready Deadline: Sunday 28th January 2024

Submission

Submit your paper (maximum 4 pages, plus 1 additional page of references) via the HotCRP submission site: https://msr2024-data-tool.hotcrp.com/.

Submitted papers will undergo single-anonymous peer review. We opt for a single-anonymous peer review (i.e., authors’ names should be listed on the manuscript, as opposed to the double-anonymous peer review of the main track) due to the requirement above to describe the ways how data has been used in the previous studies, including the bibliographic reference to those studies. Such a reference is likely to disclose the authors’ identity.

To make research datasets and research software accessible and citable, we further encourage authors to attend to the FAIR rules, i.e., data should be: Findable, Accessible, Interoperable, and Reusable.

All authors should use the official “ACM Primary Article Template”, as can be obtained from the ACM Proceedings Template page. LaTeX users should use the sigconf option, as well as the review (to produce line numbers for easy reference by the reviewers) options, and please do not use the anonymous option(i.e., to comply with the single-anonymous peer review policy). To that end, the following LaTeX code can be placed at the start of the LaTeX document:

\documentclass[sigconf,review]{acmart}

\acmConference[MSR 2024]{21st International Conference on Mining Software Repositories}{April 2024}{Lisbon, Portugal}

Papers submitted for consideration should not have been published elsewhere and should not be under review or submitted for review elsewhere for the duration of consideration. ACM plagiarism policies and procedures shall be followed for cases of double submission. The submission must also comply with the IEEE Policy on Authorship. Please read the ACM Policy on Plagiarism, Misrepresentation, and Falsification and the IEEE - Introduction to the Guidelines for Handling Plagiarism Complaints before submitting.

Upon notification of acceptance, all authors of accepted papers will be asked to complete a copyright form and will receive further instructions for preparing their camera-ready versions. At least one author of each paper is expected to register and present the results at the MSR 2024 conference. All accepted contributions will be published in the conference’s electronic proceedings.