MSR 2024
Mon 15 - Tue 16 April 2024 Lisbon, Portugal
co-located with ICSE 2024
Tue 16 Apr 2024 14:32 - 14:36 at Almada Negreiros - Process automation & DevOps II Chair(s): Shane McIntosh

Collaborative software development through GitHub repositories frequently relies on bot accounts that automate many repetitive and error-prone tasks. Several bot identification tools and techniques have been proposed in the past, but they tend to rely on a substantial amount of historical data, or they limit themselves to a reduced subset of activity types. To overcome these limitations, we developed RABBIT, an open-source command-line tool that queries the GitHub Events API to retrieve the recent events of a given GitHub account to predict whether the account is a human or a bot. The prediction is based on an XGBoost classification model that relies on six features related to account activities, and that is obtained through grid-search 10-fold cross-validation. Based on a newly created ground-truth dataset of GitHub accounts containing 644 bots and 691 humans, we trained the model on 60% of the data and achieved a very good performance, with an AUC of 0.97, weighted F1 score of 0.94, precision of 0.94 and recall of 0.94. After integrating this model into RABBIT, and testing the tool’s performance on the remaining 40% of unseen data achieved a performance of 0.93 for AUC, weighted F1 score, precision and recall each. Taking into account the imposed hourly GitHub API limit, RABBIT can classify thousands of accounts per hour using at most 3 queries per account.

Tue 16 Apr

Displayed time zone: Lisbon change

14:00 - 15:30
Process automation & DevOps IITechnical Papers / Data and Tool Showcase Track at Almada Negreiros
Chair(s): Shane McIntosh University of Waterloo
14:00
12m
Talk
Options Matter: Documenting and Fixing Non-Reproducible Builds in Highly-Configurable Systems
Technical Papers
Georges Aaron RANDRIANAINA Université de Rennes 1, IRISA, Djamel Eddine Khelladi CNRS, IRISA, University of Rennes, Olivier Zendra Inria, Mathieu Acher University of Rennes, France / Inria, France / CNRS, France / IRISA, France
14:12
12m
Talk
How do Machine Learning Projects use Continuous Integration Practices? An Empirical Study on GitHub Actions
Technical Papers
João Helis Bernardo Federal Institute of Education, Science and Technology of Rio Grande do Norte, Daniel Alencar Da Costa University of Otago, Sergio Queiroz de Medeiros Universidade Federal do Rio Grande do Norte, Uirá Kulesza Federal University of Rio Grande do Norte
DOI Pre-print
14:24
4m
Talk
A dataset of GitHub Actions workflow histories
Data and Tool Showcase Track
Guillaume Cardoen University of Mons, Tom Mens University of Mons, Alexandre Decan University of Mons; F.R.S.-FNRS
14:28
4m
Talk
gawd: A Differencing Tool for GitHub Actions Workflows
Data and Tool Showcase Track
Pooya Rostami Mazrae University of Mons, Alexandre Decan University of Mons; F.R.S.-FNRS, Tom Mens University of Mons
14:32
4m
Talk
RABBIT: A tool for identifying bot accounts based on their recent GitHub event history
Data and Tool Showcase Track
Natarajan Chidambaram University of Mons, Tom Mens University of Mons, Alexandre Decan University of Mons; F.R.S.-FNRS
14:36
12m
Talk
An Investigation of Patch Porting Practices of the Linux Kernel Ecosystem
Technical Papers
Xingyu Li UC Riverside, Zheng Zhang UC Riverside, Zhiyun Qian University of California at Riverside, USA, Trent Jaeger UC Riverside, Chengyu Song University of California at Riverside, USA
14:48
4m
Talk
BugsPHP: A dataset for Automated Program Repair in PHP
Data and Tool Showcase Track
K.D. Pramod University of Moratuwa, Sri Lanka, W.T.N. De Silva University of Moratuwa, Sri Lanka, W.U.K. Thabrew University of Moratuwa, Sri Lanka, Ridwan Salihin Shariffdeen National University of Singapore, Sandareka Wickramanayake University of Moratuwa, Sri Lanka
Pre-print