🤖 AI Summary
Existing 3D tokenization methods treat representations as purely spatial compression, struggling to handle assets in open-world settings characterized by interwoven components, topological noise, and loose structural organization. This leads to entangled and non-manipulable latent encodings of geometry, part ownership, and assembly relationships. To address this, this work proposes an interface-centric, manipulable generative state that shifts 3D tokenization from a compression paradigm to a state paradigm. By explicitly disentangling canonical local geometry, partitioned conditional context, and relational seam variables (C2LT-3D), the approach enables query-based decoding, constraint imposition, and structure validation without post-processing. Trained solely on single-object CAD data, the model demonstrates zero-shot transfer to complex multi-component scenes, maintaining structural robustness and latent manipulability even under adversarial connectivity conditions.
📝 Abstract
Current 3D tokenizers largely treat representation as spatial compression: compact codes reconstruct surface geometry, but leave component ownership and attachment validity implicit. In open-world assets with intersecting components, noisy topology, and weak canonical structure, this creates a representation mismatch: local shape, component identity, and assembly relations become entangled in a latent stream and are not natively addressable during decoding. We formulate an alternative view, interface-centric generative states, in which tokenization constructs an operational state rather than a passive compressed code. The state exposes local geometry, component ownership, and attachment validity as variables that can be queried, constrained, and repaired during decoding. We instantiate this formulation with Component-Conditioned Canonical Local Tokens (C2LT-3D), factorizing representation into canonical local geometry, partition-conditioned context, and relational seam variables. Each factor targets a distinct failure mode of compression-centric tokens: pose leakage, cross-component interference, or invalid local attachment. This exposed state supports attachment validation, latent structural repair, targeted intervention, and constrained serialization without a separate post-hoc structure recovery module. Trained on single-object CAD models and evaluated zero-shot on open-world multi-component assets, C2LT-3D improves structural robustness and shows that its latent variables remain actionable under adversarial attachment settings. These results suggest that open-world 3D generative representations should be evaluated not only by reconstruction fidelity, but by whether their discrete states remain operational for assembly-level structural reasoning.