A dataset of GitHub Actions workflow histories
GitHub Actions is the de facto workflow automation tool for GitHub repositories. Its popularity has increased dramatically over the recent years, opening up opportunities for empirical studies related to its usage. To enable such studies, we implemented gigawork, an open source tool for extracting the commit histories of changes to workflow files in GitHub repositories. Using this tool we collected and publicly released a dataset of 160K+ commit histories of workflow files in 32K+ public GitHub repositories, covering 1.5M+ workflow file versions. In order to facilitate its use by other researchers, the dataset includes relevant metadata related to each workflow commit. gigawork is publicly released on GitHub (https://github.com/cardoeng/gigawork) and its associated dataset can be found on Zenodo (https://doi.org/10.5281/zenodo.10259014).
Tue 16 AprDisplayed time zone: Lisbon change
14:00 - 15:30 | Process automation & DevOps IITechnical Papers / Data and Tool Showcase Track at Almada Negreiros Chair(s): Shane McIntosh University of Waterloo | ||
14:00 12mTalk | Options Matter: Documenting and Fixing Non-Reproducible Builds in Highly-Configurable Systems Technical Papers Georges Aaron RANDRIANAINA Université de Rennes 1, IRISA, Djamel Eddine Khelladi CNRS, IRISA, University of Rennes, Olivier Zendra Inria, Mathieu Acher University of Rennes, France / Inria, France / CNRS, France / IRISA, France | ||
14:12 12mTalk | How do Machine Learning Projects use Continuous Integration Practices? An Empirical Study on GitHub Actions Technical Papers João Helis Bernardo Federal Institute of Education, Science and Technology of Rio Grande do Norte, Daniel Alencar Da Costa University of Otago, Sergio Queiroz de Medeiros Universidade Federal do Rio Grande do Norte, Uirá Kulesza Federal University of Rio Grande do Norte DOI Pre-print | ||
14:24 4mTalk | A dataset of GitHub Actions workflow histories Data and Tool Showcase Track Guillaume Cardoen University of Mons, Tom Mens University of Mons, Alexandre Decan University of Mons; F.R.S.-FNRS | ||
14:28 4mTalk | gawd: A Differencing Tool for GitHub Actions Workflows Data and Tool Showcase Track Pooya Rostami Mazrae University of Mons, Alexandre Decan University of Mons; F.R.S.-FNRS, Tom Mens University of Mons | ||
14:32 4mTalk | RABBIT: A tool for identifying bot accounts based on their recent GitHub event history Data and Tool Showcase Track Natarajan Chidambaram University of Mons, Tom Mens University of Mons, Alexandre Decan University of Mons; F.R.S.-FNRS | ||
14:36 12mTalk | An Investigation of Patch Porting Practices of the Linux Kernel Ecosystem Technical Papers Xingyu Li UC Riverside, Zheng Zhang UC Riverside, Zhiyun Qian University of California at Riverside, USA, Trent Jaeger UC Riverside, Chengyu Song University of California at Riverside, USA | ||
14:48 4mTalk | BugsPHP: A dataset for Automated Program Repair in PHP Data and Tool Showcase Track K.D. Pramod University of Moratuwa, Sri Lanka, W.T.N. De Silva University of Moratuwa, Sri Lanka, W.U.K. Thabrew University of Moratuwa, Sri Lanka, Ridwan Salihin Shariffdeen National University of Singapore, Sandareka Wickramanayake University of Moratuwa, Sri Lanka Pre-print |