MSR 2024
Mon 15 - Tue 16 April 2024 Lisbon, Portugal
co-located with ICSE 2024
Tue 16 Apr 2024 11:48 - 12:00 at Grande Auditório - Software Evolution & Analysis Chair(s): Vladimir Kovalenko

CONTEXT The R programming language has a huge and active community, especially in the area of statistical computing. Its interpreted nature allows for several interesting constructs like the manipulation of functions at run-time that hinder static analysis of R programs. At the same time, there is a lack of existing research regarding how these features, or even the R language as a whole are used in practice. OBJECTIVE In this paper, we conduct a large-scale, static analysis of more than 50 million lines of real-world R programs and packages to identify their characteristics and the features that are really used. Moreover, we compare the similarities and differences between the scripts of R users and the implementations of package authors and provide insights for static analysis tools. METHOD We analyze 4 230 R scripts submitted alongside publications and the sources of 19 450 CRAN packages. For each of the over 350 000 files, we retrieve the abstract syntax tree (AST), extract basic dataflow information of the program, and collect quantitative information for features of interest that are then summarized. RESULTS We find a high frequency of named-based indexing operations, assignments, and loops, but a low frequency for most of R’s reflective functions. Furthermore, we find neither testing functions nor many calls to R’s foreign function interface in the publication submissions. CONCLUSION R scripts and package sources differ, for example, in their size, the way they include other packages and the usage of R’s reflective capabilities. Moreover, even though R offers a lot of powerful reflective capabilities, most are used seldom or not at all. Still, we find a set of reflective features, like calls to eval and load, that are used frequently and should be supported by static analysis tools.

Slides (noanim-presentation.pdf)1.46MiB
Slides (Animated) (presentation.pdf)1.55MiB

Tue 16 Apr

Displayed time zone: Lisbon change

11:00 - 12:30
Software Evolution & AnalysisTechnical Papers / Data and Tool Showcase Track / Industry Track at Grande Auditório
Chair(s): Vladimir Kovalenko JetBrains Research
11:00
12m
Talk
Unveiling ChatGPT's Usage in Open Source Projects: A Mining-based Study
Technical Papers
Rosalia Tufano Università della Svizzera Italiana, Antonio Mastropaolo Università della Svizzera italiana, Federica Pepe University of Sannio, Ozren Dabic Software Institute, Università della Svizzera italiana (USI), Switzerland, Massimiliano Di Penta University of Sannio, Italy, Gabriele Bavota Software Institute @ Università della Svizzera Italiana
11:12
12m
Talk
DRMiner: A Tool For Identifying And Analyzing Refactorings In Dockerfile
Technical Papers
Emna Ksontini University of Michigan - Dearborn, Aycha Abid Oakland University, Rania Khalsi University of Michigan - Flint, Marouane Kessentini University of Michigan - Flint
11:24
12m
Talk
A Large-Scale Empirical Study of Open Source License Usage: Practices and Challenges
Technical Papers
Jiaqi Wu Zhejiang University, Lingfeng Bao Zhejiang University, Xiaohu Yang Zhejiang University, Xin Xia Huawei Technologies, Xing Hu Zhejiang University
11:36
12m
Talk
Analyzing the Evolution and Maintenance of ML Models on Hugging Face
Technical Papers
Joel Castaño Fernández Universitat Politècnica de Catalunya, Silverio Martínez-Fernández UPC-BarcelonaTech, Xavier Franch Universitat Politècnica de Catalunya, Justus Bogner Vrije Universiteit Amsterdam
Link to publication Pre-print
11:48
12m
Talk
On the Anatomy of Real-World R Code for Static Analysis
Technical Papers
Florian Sihler Ulm University, Lukas Pietzschmann Ulm University, Raphael Straub Ulm University, Matthias Tichy Ulm University, Germany, Andor Diera Ulm University, Abdelhalim Dahou GESIS Leibniz Institute for the Social Sciences
Pre-print File Attached
12:00
6m
Talk
Encoding Version History Context for Better Code Representation
Technical Papers
Huy Nguyen The University of Melbourne, Christoph Treude Singapore Management University, Patanamon Thongtanunam University of Melbourne
Pre-print
12:06
4m
Talk
CodeLL: A Lifelong Learning Dataset to Support the Co-Evolution of Data and Language Models of Code
Data and Tool Showcase Track
Martin Weyssow DIRO, Université de Montréal, Claudio Di Sipio University of L'Aquila, Davide Di Ruscio University of L'Aquila, Houari Sahraoui DIRO, Université de Montréal
12:10
4m
Talk
Bidirectional Paper-Repository Tracing in Software Engineering
Data and Tool Showcase Track
Daniel Garijo , Miguel Arroyo Universidad Politécnica de Madrid, Esteban González Guardia Universidad Politécnica de Madrid, Christoph Treude Singapore Management University, Nicola Tarocco CERN
12:14
4m
Talk
DistilKaggle: A Distilled Dataset of Kaggle Jupyter Notebooks
Data and Tool Showcase Track
Mojtaba Mostafavi Department of Computer Engineering of Sharif University of Technology, Arash Asgari Department of Computer Engineering of Sharif University of Technology, Mohammad Abolnejadian Department of Computer Engineering of Sharif University of Technology, Abbas Heydarnoori Bowling Green State University
12:18
5m
Talk
Estimating Usage of Open Source Projects
Industry Track
Sophia Vargas Google LLC, Georg Link Bitergia, JaYoung Lee Google