MalwareBench: Malware samples are not enough
The prevalent use of third-party components in modern software development, coupled with rapid modernization and digitization, has significantly amplified the risk of software supply chain security attacks. Popular large registries like npm and PyPI are highly targeted malware distribution channels for attackers due to heavy growth and dependence on third-party components. Industry and academia are working towards building tools to detect malware in the software supply chain. However, a lack of benchmark datasets containing both malware and neutral packages hampers the evaluation of the performance of these malware detection tools. The goal of our study is to aid researchers and tool developers in evaluating and improving malware detection tools by contributing a benchmark dataset built by systematically collecting malicious and neutral packages from the npm and PyPI ecosystems. We present MalwareBench, a labeled dataset of 20,534 packages (of which 6,475 are malicious) of npm and PyPI ecosystems. We constructed the benchmark dataset by incorporating pre-existing malware datasets with the Socket internal benchmark data and including popular and newly released npm and PyPI packages. The ground truth labels of these packages were determined using the Socket AI Scanner and manual inspection.