🤖 AI Summary
This work addresses the general 3D rotation estimation problem for RGB images—requiring no category-specific training and supporting zero-shot transfer. Method: We propose a lightweight Transformer-based framework that introduces, for the first time, joint latent-space modeling across multiple reference images. Our approach integrates multi-reference feature fusion, rotation-aware latent representation learning, and end-to-end differentiable regression to directly predict the 3D rotation of a query image from several reference images with known poses in a single forward pass. Contribution/Results: The method achieves state-of-the-art accuracy on multiple benchmarks while reducing inference latency by over 40%, significantly enhancing deployability on edge devices. It simultaneously delivers strong generalization across unseen categories and low-latency inference—offering a favorable trade-off between robustness and efficiency.
📝 Abstract
We introduce Eff-GRot, an approach for efficient and generalizable rotation estimation from RGB images. Given a query image and a set of reference images with known orientations, our method directly predicts the object's rotation in a single forward pass, without requiring object- or category-specific training. At the core of our framework is a transformer that performs a comparison in the latent space, jointly processing rotation-aware representations from multiple references alongside a query. This design enables a favorable balance between accuracy and computational efficiency while remaining simple, scalable, and fully end-to-end. Experimental results show that Eff-GRot offers a promising direction toward more efficient rotation estimation, particularly in latency-sensitive applications.