Detecting Active and Stealthy Typosquatting Threats in Package Registries

📅 2025-02-27

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

To address the challenges of detecting both active and stealthy typosquatting attacks in software supply chains—namely, high false-positive rates and weak cross-ecosystem support—this paper proposes TypoSmart, a robust detection system. Methodologically, TypoSmart (1) models attack patterns empirically using real-world package registry data; (2) employs a unified feature representation and embedded similarity search across six major ecosystems—including npm and PyPI; and (3) integrates engineering-informed heuristics to suppress false positives. Evaluation demonstrates that TypoSmart achieves 73–91% faster detection and reduces false positives by 70.4% compared to state-of-the-art baselines. Deployed at scale in collaboration with an industry partner, it identified and facilitated the removal of 3,658 malicious packages within a single month. These results underscore TypoSmart’s effectiveness in enhancing both the security posture and operational practicality of package registries.

Technology Category

Application Category

📝 Abstract

Typosquatting attacks, also known as package confusion attacks, threaten software supply chains. Attackers make packages with names that resemble legitimate ones, tricking engineers into installing malware. While prior work has developed defenses against typosquatting in some software package registries, notably npm and PyPI, gaps remain: addressing high false-positive rates; generalizing to more software package ecosystems; and gaining insight from real-world deployment. In this work, we introduce TypoSmart, a solution designed to address the challenges posed by typosquatting attacks. We begin by conducting a novel analysis of typosquatting data to gain deeper insights into attack patterns and engineering practices. Building on state-of-the-art approaches, we extend support to six software package registries using embedding-based similarity search, achieving a 73%-91% improvement in speed. Additionally, our approach significantly reduces 70.4% false-positive compared to prior work results. TypoSmart is being used in production at our industry partner and contributed to the removal of 3,658 typosquatting packages in one month. We share lessons learned from the production deployment.

Problem

Research questions and friction points this paper is trying to address.

Detects and mitigates typosquatting threats in software package registries.

Reduces false-positive rates and generalizes across multiple ecosystems.

Provides insights from real-world deployment and improves detection speed.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Embedding-based similarity search across registries

Significant reduction in false-positive rates

Real-world deployment insights and impact

🔎 Similar Papers

PackageIntel: Leveraging Large Language Models for Automated Intelligence Extraction in Package Ecosystems