LATTE: Latent Trajectory Embedding for Diffusion-Generated Image Detection

📅 2025-07-03

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

The increasing photorealism of diffusion-generated images poses significant challenges for forensic detection. To address this, we propose a general-purpose detection method based on modeling denoising trajectories in the latent space. Our approach explicitly captures structural discrepancies between real and synthetic images by constructing Latent Trajectory Embeddings across multi-step denoising processes. We further introduce a latent-visual feature refinement module that jointly fuses temporal latent trajectory representations with spatial visual features to enhance discriminative capability. Finally, a lightweight classifier enables efficient and scalable detection. Extensive experiments demonstrate that our method achieves state-of-the-art performance on benchmarks including GenImage and DiffusionFake. Notably, it exhibits strong generalization across unseen generative models and diverse datasets—without requiring model-specific fine-tuning or access to generator internals. This work establishes a new paradigm for AIGC forensics grounded in latent-space dynamics rather than pixel-level artifacts.

Technology Category

Application Category

📝 Abstract

The rapid advancement of diffusion-based image generators has made it increasingly difficult to distinguish generated from real images. This can erode trust in digital media, making it critical to develop generalizable detectors for generated images. Recent methods leverage diffusion denoising cues, but mainly focus on single-step reconstruction errors, ignoring the inherent sequential nature of the denoising process. In this work, we propose LATTE - Latent Trajectory Embedding - a novel approach that models the evolution of latent embeddings across several denoising timesteps. By modeling the trajectory of such embeddings rather than single-step errors, LATTE captures subtle, discriminative patterns that distinguish real from generated images. Each latent is refined by employing our latent-visual feature refinement module and aggregated into a unified representation. Afterwards, it is fused with the visual features and finally passed into a lightweight classifier. Our experiments demonstrate that LATTE surpasses the baselines on several established benchmarks, such as GenImage and DiffusionFake. Moreover, it demonstrates strong performance in cross-generator and cross-datasets settings, highlighting the potential of using the trajectory of latent embeddings for generated image detection. The code is available on the following link: https://github.com/AnaMVasilcoiu/LATTE-Diffusion-Detector.

Problem

Research questions and friction points this paper is trying to address.

Detect diffusion-generated images to combat digital media distrust

Model latent trajectory embeddings across denoising timesteps

Improve cross-generator and cross-dataset detection generalizability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Models latent embedding evolution across denoising steps

Refines latent-visual features with a dedicated module

Uses lightweight classifier for final image detection

🔎 Similar Papers

LaRE2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection