Consiglieres in the Shadow: Understanding the Use of Uncensored Large Language Models in Cybercrimes

📅 2025-08-18

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

This study systematically investigates the abuse of uncensored large language models (ULLMs) in cybercrime. Addressing their high concealment and traceability challenges, we introduce the first open-source knowledge graph for LLM evolution and association, integrating model provenance analysis, few-shot graph neural networks, and multi-dimensional harmful content detection to enable scalable identification of malicious ULLMs. Our methodology identifies over 11,000 potentially malicious ULLMs—some with download counts exceeding ten million—and empirically demonstrates their widespread use in generating pornography, violent content, and malicious code, as well as their integration into numerous illicit online services. Furthermore, we uncover underground forum–based low-cost customization techniques for malicious ULLMs. The findings provide critical empirical evidence and actionable technical pathways for regulatory intervention and AI safety governance.

Technology Category

Application Category

📝 Abstract

The advancement of AI technologies, particularly Large Language Models (LLMs), has transformed computing while introducing new security and privacy risks. Prior research shows that cybercriminals are increasingly leveraging uncensored LLMs (ULLMs) as backends for malicious services. Understanding these ULLMs has been hindered by the challenge of identifying them among the vast number of open-source LLMs hosted on platforms like Hugging Face. In this paper, we present the first systematic study of ULLMs, overcoming this challenge by modeling relationships among open-source LLMs and between them and related data, such as fine-tuning, merging, compressing models, and using or generating datasets with harmful content. Representing these connections as a knowledge graph, we applied graph-based deep learning to discover over 11,000 ULLMs from a small set of labeled examples and uncensored datasets. A closer analysis of these ULLMs reveals their alarming scale and usage. Some have been downloaded over a million times, with one over 19 million installs. These models -- created through fine-tuning, merging, or compression of other models -- are capable of generating harmful content, including hate speech, violence, erotic material, and malicious code. Evidence shows their integration into hundreds of malicious applications offering services like erotic role-play, child pornography, malicious code generation, and more. In addition, underground forums reveal criminals sharing techniques and scripts to build cheap alternatives to commercial malicious LLMs. These findings highlight the widespread abuse of LLM technology and the urgent need for effective countermeasures against this growing threat.

Problem

Research questions and friction points this paper is trying to address.

Identify uncensored LLMs among open-source models

Analyze harmful content generation by ULLMs

Investigate criminal use of ULLMs in malicious apps

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model relationships among open-source LLMs

Apply graph-based deep learning

Discover uncensored LLMs via knowledge graph

🔎 Similar Papers

Large Language Models for Cyber Security: A Systematic Literature Review