🤖 AI Summary
The absence of a unified reference frame for 3D objects leads to pose ambiguity and unstable directional semantics, which hinders performance in generation and retrieval tasks. To address this, this work introduces the first large-scale, canonically aligned 3D object dataset comprising 320,000 instances and proposes an efficient alignment framework. By integrating compact hypothesis generation, lightweight human-in-the-loop validation, and a high-throughput pipeline, the method reduces per-object alignment time from minutes to seconds. This study is the first to enable learnable directional semantics, substantially improving the stability of 3D generation, enabling precise cross-modal shape retrieval, and achieving zero-shot point cloud orientation estimation on out-of-distribution data.
📝 Abstract
3D learning systems implicitly assume that objects occupy a coherent reference frame. Nonetheless, in practice, every asset arrives with an arbitrary global rotation, and models are left to resolve directional ambiguity on their own. This persistent misalignment suppresses pose-consistent generation, and blocks the emergence of stable directional semantics. To address this issue, we construct \methodName{}, a massive canonical 3D dataset of 320K objects over 1,156 categories -- an order-of-magnitude increase over prior work. At this scale, directional semantics become statistically learnable: Canoverse improves 3D generation stability, enables precise cross-modal 3D shape retrieval, and unlocks zero-shot point-cloud orientation estimation even for out-of-distribution data. This is achieved by a new canonicalization framework that reduces alignment from minutes to seconds per object via compact hypothesis generation and lightweight human discrimination, transforming canonicalization from manual curation into a high-throughput data generation pipeline. The Canoverse dataset will be publicly released upon acceptance. Project page: https://github.com/123321456-gif/Canoverse