A Large-Scale Empirical Study of Open Source License Usage: Practices and Challenges
The popularity of open source software (OSS) has led to a significant increase in the number of available licenses, each with their own set of terms and conditions. This proliferation of licenses has made it increasingly challenging for developers to select an appropriate license for their projects and to ensure that they are complying with the terms of those licenses. As a result, there is a need for empirical studies to identify current practices and challenges in license usage, both to help developers make informed decisions about license selection and to ensure that OSS is being used and distributed in a legal and ethical manner. Moreover, the development of new licenses might be required to better meet the needs of the open source community and address emerging legal issues.
In this paper, we conduct a large-scale empirical study of license usage across five package management platforms, i.e., Maven, NPM, PyPI, RubyGems, and Cargo. Our objective is to examine the current trends and potential issues in license usage of the OSS community. In total, we analyze the licenses of 33,710,877 packages across the selected five platforms. We statistically analyze licenses in package management platforms from multiple perspectives, e.g., license usage, license incompatibility, license updates, and license evolution. Moreover, we conduct a comparative study of various aspects of core packages and common packages in these platforms. Our results reveal irregularities in license names and license incompatibilities that require attention. We observe both similarities and differences in license usage across the five platforms, with Cargo being the most standardized among them. Finally, we discuss some implications for actions based on our findings.
Tue 16 AprDisplayed time zone: Lisbon change
11:00 - 12:30 | Software Evolution & AnalysisTechnical Papers / Data and Tool Showcase Track / Industry Track at Grande Auditório Chair(s): Vladimir Kovalenko JetBrains Research | ||
11:00 12mTalk | Unveiling ChatGPT's Usage in Open Source Projects: A Mining-based Study Technical Papers Rosalia Tufano Università della Svizzera Italiana, Antonio Mastropaolo Università della Svizzera italiana, Federica Pepe University of Sannio, Ozren Dabic Software Institute, Università della Svizzera italiana (USI), Switzerland, Massimiliano Di Penta University of Sannio, Italy, Gabriele Bavota Software Institute @ Università della Svizzera Italiana | ||
11:12 12mTalk | DRMiner: A Tool For Identifying And Analyzing Refactorings In Dockerfile Technical Papers Emna Ksontini University of Michigan - Dearborn, Aycha Abid Oakland University, Rania Khalsi University of Michigan - Flint, Marouane Kessentini University of Michigan - Flint | ||
11:24 12mTalk | A Large-Scale Empirical Study of Open Source License Usage: Practices and Challenges Technical Papers Jiaqi Wu Zhejiang University, Lingfeng Bao Zhejiang University, Xiaohu Yang Zhejiang University, Xin Xia Huawei Technologies, Xing Hu Zhejiang University | ||
11:36 12mTalk | Analyzing the Evolution and Maintenance of ML Models on Hugging Face Technical Papers Joel Castaño Fernández Universitat Politècnica de Catalunya, Silverio Martínez-Fernández UPC-BarcelonaTech, Xavier Franch Universitat Politècnica de Catalunya, Justus Bogner Vrije Universiteit Amsterdam Link to publication Pre-print | ||
11:48 12mTalk | On the Anatomy of Real-World R Code for Static Analysis Technical Papers Florian Sihler Ulm University, Lukas Pietzschmann Ulm University, Raphael Straub Ulm University, Matthias Tichy Ulm University, Germany, Andor Diera Ulm University, Abdelhalim Dahou GESIS Leibniz Institute for the Social Sciences Pre-print File Attached | ||
12:00 6mTalk | Encoding Version History Context for Better Code Representation Technical Papers Huy Nguyen The University of Melbourne, Christoph Treude Singapore Management University, Patanamon Thongtanunam University of Melbourne Pre-print | ||
12:06 4mTalk | CodeLL: A Lifelong Learning Dataset to Support the Co-Evolution of Data and Language Models of Code Data and Tool Showcase Track Martin Weyssow DIRO, Université de Montréal, Claudio Di Sipio University of L'Aquila, Davide Di Ruscio University of L'Aquila, Houari Sahraoui DIRO, Université de Montréal | ||
12:10 4mTalk | Bidirectional Paper-Repository Tracing in Software Engineering Data and Tool Showcase Track Daniel Garijo , Miguel Arroyo Universidad Politécnica de Madrid, Esteban González Guardia Universidad Politécnica de Madrid, Christoph Treude Singapore Management University, Nicola Tarocco CERN | ||
12:14 4mTalk | DistilKaggle: A Distilled Dataset of Kaggle Jupyter Notebooks Data and Tool Showcase Track Mojtaba Mostafavi Department of Computer Engineering of Sharif University of Technology, Arash Asgari Department of Computer Engineering of Sharif University of Technology, Mohammad Abolnejadian Department of Computer Engineering of Sharif University of Technology, Abbas Heydarnoori Bowling Green State University | ||
12:18 5mTalk | Estimating Usage of Open Source Projects Industry Track |