Training Without Orthogonalization, Inference With SVD: A Gradient Analysis of Rotation Representations

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This study investigates why singular value decomposition (SVD)-based orthogonalization degrades SO(3) rotation estimation performance during training, despite SVD outperforming the Gram–Schmidt method at inference. By deriving, for the first time, the exact spectral structure of the Jacobian in SVD backpropagation, the authors reveal that small singular values induce severe gradient distortion during training. They further demonstrate that the 6D Gram–Schmidt parameterization suffers from imbalanced gradient flow, providing theoretical support for the superiority of 9D representations. Based on these insights, the work proposes a paradigm that avoids orthogonalization during training and applies SVD only at inference, thereby establishing both theoretical grounding and practical guidance for 9D regression followed by SVD-based projection.

Technology Category

Application Category

📝 Abstract

Recent work has shown that removing orthogonalization during training and applying it only at inference improves rotation estimation in deep learning, with empirical evidence favoring 9D representations with SVD projection. However, the theoretical understanding of why SVD orthogonalization specifically harms training, and why it should be preferred over Gram-Schmidt at inference, remains incomplete. We provide a detailed gradient analysis of SVD orthogonalization specialized to $3 \times 3$ matrices and $SO(3)$ projection. Our central result derives the exact spectrum of the SVD backward pass Jacobian: it has rank $3$ (matching the dimension of $SO(3)$) with nonzero singular values $2/(s_i + s_j)$ and condition number $κ= (s_1 + s_2)/(s_2 + s_3)$, creating quantifiable gradient distortion that is most severe when the predicted matrix is far from $SO(3)$ (e.g., early in training when $s_3 \approx 0$). We further show that even stabilized SVD gradients introduce gradient direction error, whereas removing SVD from the training loop avoids this tradeoff entirely. We also prove that the 6D Gram-Schmidt Jacobian has an asymmetric spectrum: its parameters receive unequal gradient signal, explaining why 9D parameterization is preferable. Together, these results provide the theoretical foundation for training with direct 9D regression and applying SVD projection only at inference.

Problem

Research questions and friction points this paper is trying to address.

rotation estimation

SVD orthogonalization

gradient analysis

SO(3)

representation learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

SVD orthogonalization

gradient analysis

rotation representation