Dataset: Copy-based Reuse in Open Source Software
In Open Source Software, the source code and any other resources available in a project can be viewed or reused by anyone subject to often permissive licensing restrictions. In contrast to some studies of dependency-based reuse supported via package managers, no studies of OSS-wide copy-based reuse exist. This dataset seeks to encourage the studies of OSS-wide copy-based reuse by providing copying activity data that captures whole-file copying that captures nearly all OSS. To accomplish that, we develop approaches to detect copy-based reuse by developing an efficient algorithm that exploits World of Code infrastructure: a curated and cross referenced collection of nearly all open source repositories. We expect this data will enable future research and tool development that support such reuse and minimize associated risks.
Mon 15 AprDisplayed time zone: Lisbon change
11:00 - 12:30 | Ecosystems, Reuse and APIs & TutorialsData and Tool Showcase Track / Technical Papers / Tutorials at Almada Negreiros Chair(s): Mahmoud Alfadel University of Waterloo, Ayushi Rastogi University of Groningen, The Netherlands | ||
11:00 12mTalk | Thirty-Three Years of Mathematicians and Software Engineers: A Case Study of Domain Expertise and Participation in Proof Assistant Ecosystems Technical Papers Gwenyth Lincroft Northeastern University, Minsung Cho Northeastern University, Mahsa Bazzaz Northeastern University, Katherine Hough Northeastern University, Jonathan Bell Northeastern University Pre-print Media Attached | ||
11:12 12mTalk | Boosting API Misuse Detection via Integrating API Constraints from Multiple Sources Technical Papers Can Li Nanjing University of Aeronautics and Astronautics, Jingxuan Zhang Nanjing University of Aeronautics and Astronautics, Yixuan Tang Nanjing University of Aeronautics and Astronautics, Zhuhang Li Nanjing University of Aeronautics and Astronautics, Tianyue Sun Nanjing University of Aeronautics and Astronautics | ||
11:24 6mTalk | Availability and Usage of Platform-Specific APIs: A First Empirical Study Technical Papers Pre-print Media Attached File Attached | ||
11:30 4mTalk | AndroLibZoo: A Reliable Dataset of Libraries Based on Software Dependency Analysis Data and Tool Showcase Track Jordan Samhi CISPA Helmholtz Center for Information Security, Tegawendé F. Bissyandé University of Luxembourg, Jacques Klein University of Luxembourg | ||
11:34 4mTalk | Goblin: A Framework for Enriching and Querying the Maven Central Dependency Graph Data and Tool Showcase Track Damien Jaime Sorbonne Université - Lip6 - SAP, Joyce El Haddad Paris Dauphine-PSL Université, CNRS, LAMSADE, Pascal Poizat Université Paris Nanterre & LIP6 Pre-print File Attached | ||
11:38 4mTalk | Dataset: Copy-based Reuse in Open Source Software Data and Tool Showcase Track Mahmoud Jahanshahi Research Assistant, University of Tennessee Knoxville, Audris Mockus The University of Tennessee & Vilnius University Pre-print | ||
11:45 45mTalk | Mining Our Way Back to Incremental Builds for DevOps Pipelines Tutorials Shane McIntosh University of Waterloo Pre-print |