MalwareBench: Malware samples are not enough (MSR 2024 - Data and Tool Showcase Track)

Who

Nusrat Zahan, Philipp Burckhardt, Mikola Lysenko, Feross Aboukhadijeh, Laurie Williams

Track

MSR 2024 Data and Tool Showcase Track

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 16 Apr 2024 14:36 - 14:40 at Grande Auditório - Security and Vision & Reflection Chair(s): Tim Menzies

Abstract

The prevalent use of third-party components in modern software development, coupled with rapid modernization and digitization, has significantly amplified the risk of software supply chain security attacks. Popular large registries like npm and PyPI are highly targeted malware distribution channels for attackers due to heavy growth and dependence on third-party components. Industry and academia are working towards building tools to detect malware in the software supply chain. However, a lack of benchmark datasets containing both malware and neutral packages hampers the evaluation of the performance of these malware detection tools. The goal of our study is to aid researchers and tool developers in evaluating and improving malware detection tools by contributing a benchmark dataset built by systematically collecting malicious and neutral packages from the npm and PyPI ecosystems. We present MalwareBench, a labeled dataset of 20,534 packages (of which 6,475 are malicious) of npm and PyPI ecosystems. We constructed the benchmark dataset by incorporating pre-existing malware datasets with the Socket internal benchmark data and including popular and newly released npm and PyPI packages. The ground truth labels of these packages were determined using the Socket AI Scanner and manual inspection.

Nusrat Zahan

North Carolina State University

United States

Philipp Burckhardt

Socket, Inc

Mikola Lysenko

Socket, Inc

Feross Aboukhadijeh

Socket, Inc

Laurie Williams

North Carolina State University

United States

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 16 Apr
Displayed time zone: Lisbon change

14:00 - 15:30	Security and Vision & ReflectionData and Tool Showcase Track / Technical Papers / Registered Reports / Vision and Reflection at Grande Auditório Chair(s): Tim Menzies North Carolina State University

14:00 12m Talk		Quantifying Security Issues in Reusable JavaScript Actions in GitHub Workflows Technical Papers Hassan Onsori Delicheh University of Mons, Belgium, Alexandre Decan University of Mons; F.R.S.-FNRS, Tom Mens University of Mons Pre-print
14:12 12m Talk		What Can Self-Admitted Technical Debt Tell Us About Security? A Mixed-Methods Study Technical Papers Nicolás E. Díaz Ferreyra Hamburg University of Technology, Mojtaba Shahin RMIT University, Mansooreh Zahedi The Univeristy of Melbourne, Sodiq Quadri Hamburg University of Technology, Riccardo Scandariato Hamburg University of Technology Pre-print
14:24 12m Talk		Are Latent Vulnerabilities Hidden Gems for Software Vulnerability Prediction? An Empirical Study Technical Papers Triet Le The University of Adelaide, Xiaoning Du Monash University, Australia, Muhammad Ali Babar School of Computer Science, The University of Adelaide
14:36 4m Talk		MalwareBench: Malware samples are not enough Data and Tool Showcase Track Nusrat Zahan North Carolina State University, Philipp Burckhardt Socket, Inc, Mikola Lysenko Socket, Inc, Feross Aboukhadijeh Socket, Inc, Laurie Williams North Carolina State University
14:40 4m Talk		Hash4Patch: A Lightweight Low False Positive Tool for Finding Vulnerability Patch Commits Data and Tool Showcase Track Simone Scalco University of Trento, Ranindya Paramitha University of Trento
14:44 4m Talk		MegaVul: A C/C++ Vulnerability Dataset with Comprehensive Code Representations Data and Tool Showcase Track Chao Ni School of Software Technology, Zhejiang University, Liyu Shen Zhejiang University, Xiaohu Yang Zhejiang University, Yan Zhu Zhejiang University, Shaohua Wang Central University of Finance and Economics Pre-print
14:48 5m Talk		Analyzing and Mitigating (with LLMs) the Security Misconfigurations of Helm Charts from Artifact Hub Registered Reports Francesco Minna Vrije Universiteit Amsterdam, Fabio Massacci University of Trento; Vrije Universiteit Amsterdam, Katja Tuma Vrije Universiteit Amsterdam
14:53 5m Talk		Fixing Smart Contract Vulnerabilities: A Comparative Analysis of Literature and Developer's Practices Registered Reports Francesco Salzano University of Molise, Simone Scalabrino University of Molise, Rocco Oliveto University of Molise, Remo Pareschi University of Molise
15:00 30m Talk		Then, Now, and Next: Constants in Changing MSR Research Landscape Vision and Reflection Ayushi Rastogi University of Groningen, The Netherlands