MSR 2024
Mon 15 - Tue 16 April 2024 Lisbon, Portugal
co-located with ICSE 2024
Tue 16 Apr 2024 14:00 - 14:12 at Almada Negreiros - Process automation & DevOps II Chair(s): Shane McIntosh

A critical aspect of software development, build reproducibility ensures the dependability, security, and maintainability of software systems. Although several factors, including the build environment, have been investigated in context of non-reproducible builds, to the best of our knowledge the precise influence of configuration options in configurable systems has not been thoroughly investigated. This paper aims at filling this gap.

This paper thus proposes an approach for the automatic identification of configuration options causing non-reproducibility of builds. It begins by building a set of builds in order to detect non-reproducible ones through binary comparison.

We then develop automated techniques that combine statistical learning with symbolic reasoning to analyze over 20,000 configuration options. Our methods are designed to both detect options causing non-reproducibility, and remedy non-reproducible configurations, two tasks that are challenging and costly to perform manually.

We evaluate our approach on three case studies, namely Toybox, Busybox, and Linux, analysing more than 2,000 configurations for each of them. Toybox and Busybox come exempt from non-reproducibility. In contrast, 47% of Linux configurations lead to non-reproducible builds.
The approach we propose in this paper is capable of identifying 10 configuration options that caused this non-reproducibility. When confronted to the Linux documentation, none of these are are documented as non-reproducible. Thus, our identified non-reproducible configuration options are novel knowledge and constitutes a direct, actionable information improvement for the Linux community. Finally, we demonstrate that our methodology effectively identifies a set of undesirable option values, enabling the enhancement and expansion of the Linux kernel documentation while automatically rectifying 96% of encountered non-reproducible builds.

Tue 16 Apr

Displayed time zone: Lisbon change

14:00 - 15:30
Process automation & DevOps IITechnical Papers / Data and Tool Showcase Track at Almada Negreiros
Chair(s): Shane McIntosh University of Waterloo
14:00
12m
Talk
Options Matter: Documenting and Fixing Non-Reproducible Builds in Highly-Configurable Systems
Technical Papers
Georges Aaron RANDRIANAINA Université de Rennes 1, IRISA, Djamel Eddine Khelladi CNRS, IRISA, University of Rennes, Olivier Zendra Inria, Mathieu Acher University of Rennes, France / Inria, France / CNRS, France / IRISA, France
14:12
12m
Talk
How do Machine Learning Projects use Continuous Integration Practices? An Empirical Study on GitHub Actions
Technical Papers
João Helis Bernardo Federal Institute of Education, Science and Technology of Rio Grande do Norte, Daniel Alencar Da Costa University of Otago, Sergio Queiroz de Medeiros Universidade Federal do Rio Grande do Norte, Uirá Kulesza Federal University of Rio Grande do Norte
DOI Pre-print
14:24
4m
Talk
A dataset of GitHub Actions workflow histories
Data and Tool Showcase Track
Guillaume Cardoen University of Mons, Tom Mens University of Mons, Alexandre Decan University of Mons; F.R.S.-FNRS
14:28
4m
Talk
gawd: A Differencing Tool for GitHub Actions Workflows
Data and Tool Showcase Track
Pooya Rostami Mazrae University of Mons, Alexandre Decan University of Mons; F.R.S.-FNRS, Tom Mens University of Mons
14:32
4m
Talk
RABBIT: A tool for identifying bot accounts based on their recent GitHub event history
Data and Tool Showcase Track
Natarajan Chidambaram University of Mons, Tom Mens University of Mons, Alexandre Decan University of Mons; F.R.S.-FNRS