HydraPrompt: An Adaptive and Asymmetric Framework of Vision-Language Models for Synthetic Image Detection

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the limited generalization of existing synthetic image detection methods, which often rely on static textual prompts and struggle to handle unseen forgery types during inference. To overcome this limitation, the authors propose an asymmetric prompting framework based on vision-language models: a unified prompt anchor is used for real images, while sample-adaptive prompts are generated for forged images. The approach further incorporates conditional supervised contrastive learning to enhance fine-grained discriminability. By jointly optimizing an asymmetric prompt adapter with learnable textual prompts, the method achieves significant performance gains over current state-of-the-art techniques across mainstream benchmarks, demonstrating superior generalization and robustness to novel forgeries.

📝 Abstract

The rapid evolution of generative models has precipitated a proliferation of fabricated content, posing significant challenges to existing Synthetic Image Detection (SID) methods. Capitalizing on advancements in vision-language models (e.g., CLIP), recent attempts have leveraged learnable textual prompts to identify synthetic images. However, they still leverage static prompt as a fixed boundary for real and fake images, failing to adapt to the varying types of forgery that emerge during inference. To overcome this issue, we propose **HydraPrompt**, an asymmetric prompting framework that dynamically adjusts the category centers by aligning with fine-grained image cues. Specifically, we propose an Asymmetric Prompt Adapter (**APA**): (1) for authentic category, we introduce a single set of prompts to capture the consistent representative patterns, which serves as a unified anchor for real content. While (2) for fake category, we construct sample-adaptive prompts that specialize in capturing diverse cues from different samples, enabling adaptive modeling of forgery image variations. To increase pronounced discriminability within different synthetic images, we further introduce a Conditional Supervised Contrastive (**CSC**) objective, which compacts the authentic representations while capturing fine-grained forgery clues. Extensive experiments on popular SID benchmarks demonstrate the state-of-the-art performance of our framework.

Problem

Research questions and friction points this paper is trying to address.

Synthetic Image Detection

Vision-Language Models

Adaptive Prompting

Forgery Detection

Generative Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

HydraPrompt

Asymmetric Prompt Adapter

Synthetic Image Detection