MSR 2024
Mon 15 - Tue 16 April 2024 Lisbon, Portugal
co-located with ICSE 2024

GitHub Copilot is an AI-enabled tool that automates program synthesis. It has gained significant attention since its launch in 2021. Recent studies have extensively examined Copilot’s capabilities in various programming tasks, as well as its security issues. However, little is known about the effect of different natural languages on code suggestion. Natural language is considered a social bias in the field of NLP, and this bias could impact the diversity of software engineering. To address this gap, we conducted an empirical study to investigate the effect of three popular natural languages (English, Japanese, and Chinese) on Copilot. We used 756 questions of varying difficulty levels from AtCoder contests for evaluation purposes. The results highlight that the capability varies across natural languages, with Chinese achieving the worst performance. Furthermore, regardless of the type of natural language, the performance decreases significantly as the difficulty of questions increases. Our work represents the initial step in comprehending the significance of natural languages in Copilot’s capability and introduces promising opportunities for future endeavors.

Tue 16 Apr

Displayed time zone: Lisbon change

09:00 - 10:30
Development: practices and humans Data and Tool Showcase Track / Technical Papers at Almada Negreiros
Chair(s): Gema Rodríguez-Pérez University of British Columbia (UBC)
09:50
6m
Talk
Exploring the Effect of Multiple Natural Languages on Code Suggestion Using GitHub Copilot
Technical Papers
Kei Koyanagi Kyushu University, Dong Wang Kyushu University, Japan, Kotaro Noguchi Kyushu University, Masanari Kondo Kyushu University, Alexander Serebrenik Eindhoven University of Technology, Yasutaka Kamei Kyushu University, Naoyasu Ubayashi Kyushu University
Pre-print
09:56
4m
Talk
A Four-Dimension Gold Standard Dataset for Opinion Mining in Software Engineering
Data and Tool Showcase Track
Md Rakibul Islam Lamar University, Md Fazle Rabbi Idaho State University, Jo Youngeun Lamar University, Arifa Islam Champa Idaho State University, Ethan J Young Lamar University, Camden M Wilson Lamar University, Gavin J Scott Lamar University, Minhaz F. Zibran Idaho State University
10:00
4m
Talk
Opening the Valve on Pure-Data: Usage Patterns and Programming Practices of a Data-Flow Based Visual Programming Language
Data and Tool Showcase Track
Anisha Islam Department of Computing Science, University of Alberta, Kalvin Eng University of Alberta, Abram Hindle University of Alberta
10:04
4m
Talk
The PIPr Dataset of Public Infrastructure as Code Programs
Data and Tool Showcase Track
Daniel Sokolowski University of St. Gallen, David Spielmann University of St. Gallen, Guido Salvaneschi University of St. Gallen
Link to publication DOI Pre-print
10:08
4m
Talk
A Dataset of Microservices-based Open-Source Projects
Data and Tool Showcase Track
Dario Amoroso d'Aragona Tampere University, Alexander Bakhtin University of Oulu, Xiaozhou Li University of Oulu, Ruoyu Su University of Oulu, Lauren Adams Baylor University, Ernesto Aponte Universidad del Sagrado Corazón, Francis Boyle Baylor University, Patrick Boyle Baylor University, Rachel Koerner Baylor University, Joseph Lee University of Richmond, Fangchao Tian University of Oulu, Yuqing Wang University of Oulu, Jesse Nyyssölä University of Helsinki, Ernesto Quevedo Baylor University, Shahidur Md Rahaman Baylor University, Amr Elsayed Baylor University, Mika Mäntylä University of Helsinki and University of Oulu, Tomas Cerny University of Arizona, Davide Taibi University of Oulu and Tampere University
10:12
4m
Talk
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
Data and Tool Showcase Track
Christian Birchler Zurich University of Applied Sciences & University of Bern, Cyrill Rohrbach University of Bern, Switzerland, Timo Kehrer University of Bern, Sebastiano Panichella Zurich University of Applied Sciences
10:16
4m
Talk
Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads
Data and Tool Showcase Track
Ramtin Ehsani Drexel University, Mia Mohammad Imran Virginia Commonwealth University, Robert Zita Elmhurst University, Kostadin Damevski Virginia Commonwealth University, Preetha Chatterjee Drexel University, USA
10:20
4m
Talk
A Dataset of Atoms of Confusion in the Android Open Source Project
Data and Tool Showcase Track
Davi Batista Tabosa Federal University of Ceará, Oton Pinheiro Federal University of Ceará, Lincoln Rocha Federal University of Ceará, Windson Viana Federal University of Ceará
10:24
4m
Talk
PlayMyData: a curated dataset of multi-platform video games
Data and Tool Showcase Track
Andrea D'Angelo University of L'Aquila, Claudio Di Sipio University of L'Aquila, Cristiano Politowski DIRO, University of Montreal, Riccardo Rubei University of L'Aquila