Eff-GRot: Efficient and Generalizable Rotation Estimation with Transformers

📅 2025-12-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the general 3D rotation estimation problem for RGB images—requiring no category-specific training and supporting zero-shot transfer. Method: We propose a lightweight Transformer-based framework that introduces, for the first time, joint latent-space modeling across multiple reference images. Our approach integrates multi-reference feature fusion, rotation-aware latent representation learning, and end-to-end differentiable regression to directly predict the 3D rotation of a query image from several reference images with known poses in a single forward pass. Contribution/Results: The method achieves state-of-the-art accuracy on multiple benchmarks while reducing inference latency by over 40%, significantly enhancing deployability on edge devices. It simultaneously delivers strong generalization across unseen categories and low-latency inference—offering a favorable trade-off between robustness and efficiency.

Technology Category

Application Category

📝 Abstract
We introduce Eff-GRot, an approach for efficient and generalizable rotation estimation from RGB images. Given a query image and a set of reference images with known orientations, our method directly predicts the object's rotation in a single forward pass, without requiring object- or category-specific training. At the core of our framework is a transformer that performs a comparison in the latent space, jointly processing rotation-aware representations from multiple references alongside a query. This design enables a favorable balance between accuracy and computational efficiency while remaining simple, scalable, and fully end-to-end. Experimental results show that Eff-GRot offers a promising direction toward more efficient rotation estimation, particularly in latency-sensitive applications.
Problem

Research questions and friction points this paper is trying to address.

Efficient rotation estimation from RGB images
Generalizable without object-specific training
Balances accuracy and computational efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based latent space comparison for rotation estimation
Single forward pass prediction without object-specific training
End-to-end scalable framework balancing accuracy and efficiency
🔎 Similar Papers
No similar papers found.