Text-to-Image Models Leave Identifiable Signatures: Implications for Leaderboard Security

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work exposes a critical anonymity vulnerability in text-to-image (T2I) models on public leaderboards: generated images inherently encode model-specific signatures, enabling attackers to de-anonymize models without access to prompts, training data, or historical outputs—thereby compromising leaderboard integrity. To quantify this risk, we introduce *prompt-level separability*, a metric measuring how distinctly models’ outputs cluster per prompt in embedding space. Experiments on 150K images—spanning 19 T2I models, 280 diverse prompts, and varied architectures, scales, and organizational origins—leverage CLIP embeddings for real-time classification, achieving a mean accuracy of 96.7%. This significantly exceeds the de-anonymization difficulty observed in LLM leaderboards. Our study is the first systematic demonstration that T2I benchmarks face substantially heightened security threats, providing both empirical evidence and methodological foundations for developing robust, tamper-resistant evaluation frameworks.

Technology Category

Application Category

📝 Abstract

Generative AI leaderboards are central to evaluating model capabilities, but remain vulnerable to manipulation. Among key adversarial objectives is rank manipulation, where an attacker must first deanonymize the models behind displayed outputs -- a threat previously demonstrated and explored for large language models (LLMs). We show that this problem can be even more severe for text-to-image leaderboards, where deanonymization is markedly easier. Using over 150,000 generated images from 280 prompts and 19 diverse models spanning multiple organizations, architectures, and sizes, we demonstrate that simple real-time classification in CLIP embedding space identifies the generating model with high accuracy, even without prompt control or historical data. We further introduce a prompt-level separability metric and identify prompts that enable near-perfect deanonymization. Our results indicate that rank manipulation in text-to-image leaderboards is easier than previously recognized, underscoring the need for stronger defenses.

Problem

Research questions and friction points this paper is trying to address.

Text-to-image models embed identifiable signatures in generated images

Model deanonymization enables manipulation of generative AI leaderboards

Current leaderboard security lacks defenses against rank manipulation attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using CLIP embeddings for real-time model identification

Introducing prompt-level separability metric for deanonymization

Analyzing 150,000+ images across 19 diverse text-to-image models

🔎 Similar Papers

No similar papers found.