Provenance of AI-Generated Images: A Vector Similarity and Blockchain-based Approach

📅 2025-10-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The increasing photorealism of AI-generated images poses significant challenges for digital content provenance and authenticity verification. Method: This paper proposes a discrimination framework based on distributional discrepancies in embedded feature vectors: it extracts image features using five mainstream vision models, identifies statistical distribution shifts between AI-generated and human-crafted images via similarity metrics and clustering analysis, and integrates blockchain technology to enable immutable logging and traceable authentication of generation metadata. Contribution/Results: The framework demonstrates robustness against medium-to-strong perturbations, achieves high classification accuracy and strong generalization across diverse, multi-source datasets, and provides a scalable, robust, and verifiable technical pathway for trustworthy digital media certification.

Technology Category

Application Category

📝 Abstract
Rapid advancement in generative AI and large language models (LLMs) has enabled the generation of highly realistic and contextually relevant digital content. LLMs such as ChatGPT with DALL-E integration and Stable Diffusion techniques can produce images that are often indistinguishable from those created by humans, which poses challenges for digital content authentication. Verifying the integrity and origin of digital data to ensure it remains unaltered and genuine is crucial to maintaining trust and legality in digital media. In this paper, we propose an embedding-based AI image detection framework that utilizes image embeddings and a vector similarity to distinguish AI-generated images from real (human-created) ones. Our methodology is built on the hypothesis that AI-generated images demonstrate closer embedding proximity to other AI-generated content, while human-created images cluster similarly within their domain. To validate this hypothesis, we developed a system that processes a diverse dataset of AI and human-generated images through five benchmark embedding models. Extensive experimentation demonstrates the robustness of our approach, and our results confirm that moderate to high perturbations minimally impact the embedding signatures, with perturbed images maintaining close similarity matches to their original versions. Our solution provides a generalizable framework for AI-generated image detection that balances accuracy with computational efficiency.
Problem

Research questions and friction points this paper is trying to address.

Detecting AI-generated images using embedding similarity
Verifying digital content authenticity and origin
Distinguishing AI-created from human-created visual content
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses image embeddings and vector similarity
Leverages five benchmark embedding models
Provides generalizable framework balancing accuracy efficiency
🔎 Similar Papers
No similar papers found.