🤖 AI Summary
This work addresses the “curse of unfolding,” a phenomenon in algorithmic unrolling where early-stage derivative iterations used to estimate the Jacobian of the solution mapping often deviate significantly from the true values. For the first time, this paper provides a non-asymptotic analysis to elucidate the underlying cause of this instability. To mitigate the issue, two strategies are proposed: explicitly truncating early derivative iterations and implicitly leveraging warm-start mechanisms inherent in bilevel optimization. Theoretical analysis demonstrates that truncation substantially enhances the stability of derivative estimation while reducing memory overhead. Numerical experiments on representative tasks validate the effectiveness of the proposed approaches, confirming their ability to improve both accuracy and efficiency in practice.
📝 Abstract
Algorithm unrolling is ubiquitous in machine learning, particularly in hyperparameter optimization and meta-learning, where Jacobians of solution mappings are computed by differentiating through iterative algorithms. Although unrolling is known to yield asymptotically correct Jacobians under suitable conditions, recent work has shown that the derivative iterates may initially diverge from the true Jacobian, a phenomenon known as the curse of unrolling. In this work, we provide a non-asymptotic analysis that explains the origin of this behavior and identifies the algorithmic factors that govern it. We show that truncating early iterations of the derivative computation mitigates the curse while simultaneously reducing memory requirements. Finally, we demonstrate that warm-starting in bilevel optimization naturally induces an implicit form of truncation, providing a practical remedy. Our theoretical findings are supported by numerical experiments on representative examples.