🤖 AI Summary
GitHub stars are susceptible to artificial manipulation, undermining their reliability as indicators of project quality and security—and thereby threatening open-source supply chain integrity. This paper introduces StarScout, the first global, longitudinal, and scalable framework for detecting fake stars, leveraging comprehensive GitHub metadata and jointly modeling low-activity and lockstep starring behaviors to precisely identify anomalous starring patterns. Our analysis identifies approximately 4.5 million suspicious stars, over 90% of which are linked to short-lived malicious repositories (e.g., malware distribution). Crucially, fake stars yield only transient visibility gains; within two months, they erode repository reputation, reversing into negative signals. The study reveals an escalating trend of star manipulation, uncovers the underlying logic of malicious ecosystems, and demonstrates the systemic degradation of trust mechanisms in open-source software. These findings establish a novel paradigm and empirical foundation for open-source health assessment and supply chain security.
📝 Abstract
GitHub, the de-facto platform for open-source software development, provides a set of social-media-like features to signal high-quality repositories. Among them, the star count is the most widely used popularity signal, but it is also at risk of being artificially inflated (i.e., faked), decreasing its value as a decision-making signal and posing a security risk to all GitHub users. In this paper, we present a systematic, global, and longitudinal measurement study of fake stars in GitHub. To this end, we build StarScout, a scalable tool able to detect anomalous starring behaviors (i.e., low activity and lockstep) across the entire GitHub metadata. Analyzing the data collected using StarScout, we find that: (1) fake-star-related activities have rapidly surged since 2024; (2) the user profile characteristics of fake stargazers are not distinct from average GitHub users, but many of them have highly abnormal activity patterns; (3) the majority of fake stars are used to promote short-lived malware repositories masquerading as pirating software, game cheats, or cryptocurrency bots; (4) some repositories may have acquired fake stars for growth hacking, but fake stars only have a promotion effect in the short term (i.e., less than two months) and become a burden in the long term. Our study has implications for platform moderators, open-source practitioners, and supply chain security researchers.