A Study of Malware Prevention in Linux Distributions

📅 2024-11-17

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This study addresses the challenge of preventing and detecting malware in Linux distribution package repositories. Through in-depth interviews with maintainers of major distributions, we find that most lack proactive static or dynamic scanning mechanisms—only Wolfi OS deploys runtime malicious behavior detection. To enable rigorous evaluation, we construct the first open, reproducible benchmark dataset of malicious Linux packages, comprising real-world attack samples and benign counterparts. We systematically evaluate six mainstream open-source detection tools (e.g., ClamAV, YARA) against this benchmark. Results reveal pervasive limitations: all tools suffer from both high false-positive rates and low true-positive rates, with an average true positive rate below 20%, indicating severely limited efficacy in the Linux ecosystem. This work establishes the first empirical benchmark for Linux supply-chain security and provides critical evidence of current tooling failures—thereby motivating and informing the design of targeted, Linux-aware detection frameworks.

Technology Category

Application Category

📝 Abstract

Malicious attacks on open source software packages are a growing concern. This concern morphed into a panic-inducing crisis after the revelation of the XZ Utils backdoor, which would have provided the attacker with, according to one observer, a"skeleton key"to the internet. This study therefore explores the challenges of preventing and detecting malware in Linux distribution package repositories. To do so, we ask two research questions: (1) What measures have Linux distributions implemented to counter malware, and how have maintainers experienced these efforts? (2) How effective are current malware detection tools at identifying malicious Linux packages? To answer these questions, we conduct interviews with maintainers at several major Linux distributions and introduce a Linux package malware benchmark dataset. Using this dataset, we evaluate the performance of six open source malware detection scanners. Distribution maintainers, according to the interviews, have mostly focused on reproducible builds to date. Our interviews identified only a single Linux distribution, Wolfi OS, that performs active malware scanning. Using this new benchmark dataset, the evaluation found that the performance of existing open-source malware scanners is underwhelming. Most studied tools excel at producing false positives but only infrequently detect true malware. Those that avoid high false positive rates often do so at the expense of a satisfactory true positive. Our findings provide insights into Linux distribution package repositories' current practices for malware detection and demonstrate the current inadequacy of open-source tools designed to detect malicious Linux packages.

Problem

Research questions and friction points this paper is trying to address.

Study explores malware prevention in Linux distributions

Evaluates effectiveness of current malware detection tools

Identifies gaps in open-source malware scanners' performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Interviews with Linux maintainers on malware measures

Created Linux package malware benchmark dataset

Evaluated six open-source malware detection scanners

🔎 Similar Papers

PVAC: package version activity categorizer, leveraging semantic versioning in a heterogeneous system

2024-09-06Empirical Software EngineeringCitations: 0

Anthropic

$320,000—$405,000 USD

San Francisco, CA, USA

Machine Learning Engineer