Who Generated This 3D Asset? Learning Source Attribution for Generative 3D Models

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

217K/year
🤖 AI Summary
This study addresses the underexplored problem of passive source attribution for generative 3D assets by systematically investigating intrinsic, traceable fingerprints left by modern 3D generative models. The work reveals two stable forensic signals—cross-view inconsistencies and structural artifacts—and establishes the first benchmark encompassing 22 state-of-the-art generators. To exploit these signals, the authors propose a hierarchical multi-view multi-modal Transformer architecture that jointly models intra-view multimodal cues (appearance, geometry, and frequency-domain features) and global inter-view relationships. Under full supervision, the method achieves 97.22% attribution accuracy; remarkably, it maintains 77.17% accuracy with only 1% of training data (fewer than five samples per generator), demonstrating the feasibility of tracing 3D generative sources in label-scarce and real-world deployment scenarios.
📝 Abstract
Generative 3D models are deployed in gaming, robotics, and immersive creation, making source attribution critical: given a 3D asset, can we identify whether and which generative model created it? This problem faces two core challenges: dispersed attribution signals, where 3D fingerprints are distributed across multi-view, geometric, and frequency-domain cues; and realistic deployment constraints, where scarce labels, degraded prompts, and mixed real/synthetic assets undermine attribution reliability. To systematically study this problem, we construct, to the best of our knowledge, the first passive source attribution benchmark for modern generated assets, covering 22 representative 3D generators under standard, few-shot, and realistic deployment protocols. Based on this benchmark, we find that generative 3D models leave two types of stable fingerprints: cross-view inconsistency and structural artifacts reflected in geometric statistics and frequency-domain cues. To capture these dispersed signals, we propose a hierarchical multi-view multi-modal Transformer that fuses appearance, geometric, and frequency-domain features within each view and models global relationships across views. Extensive experiments demonstrate strong performance, achieving 97.22% accuracy under full supervision and 77.17% accuracy with only 1% training data, corresponding to fewer than five samples per generator. These results show that modern 3D generators leave stable and attributable fingerprints, establishing a new benchmark and methodological foundation for trustworthy 3D content provenance.
Problem

Research questions and friction points this paper is trying to address.

source attribution
generative 3D models
3D asset provenance
fingerprint detection
3D content authentication
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D source attribution
generative 3D models
multi-view multi-modal Transformer
fingerprint detection
3D provenance