🤖 AI Summary
This work addresses the representation space incompatibility between models trained at different stages in continual neural network training. We propose λ-orthogonal regularization: a method that jointly optimizes affine adaptability and relaxed orthogonality constraints during feature transformation, thereby aligning latent representations of old and new models on downstream distributions while preserving newly acquired semantic structure. The approach introduces a controllable regularization strength λ to balance geometric fidelity and representational capacity, achieving cross-stage representation compatibility without degrading zero-shot transfer performance. Experiments across diverse architectures (ViT, ResNet) and benchmarks (ImageNet, CIFAR) demonstrate significant improvements in model update consistency. Moreover, the method establishes a reusable foundation for representation alignment in incremental learning and Model-as-a-Service scenarios.
📝 Abstract
Retrieval systems rely on representations learned by increasingly powerful models. However, due to the high training cost and inconsistencies in learned representations, there is significant interest in facilitating communication between representations and ensuring compatibility across independently trained neural networks. In the literature, two primary approaches are commonly used to adapt different learned representations: affine transformations, which adapt well to specific distributions but can significantly alter the original representation, and orthogonal transformations, which preserve the original structure with strict geometric constraints but limit adaptability. A key challenge is adapting the latent spaces of updated models to align with those of previous models on downstream distributions while preserving the newly learned representation spaces. In this paper, we impose a relaxed orthogonality constraint, namely $λ$-orthogonality regularization, while learning an affine transformation, to obtain distribution-specific adaptation while retaining the original learned representations. Extensive experiments across various architectures and datasets validate our approach, demonstrating that it preserves the model's zero-shot performance and ensures compatibility across model updates. Code available at: https://github.com/miccunifi/lambda_orthogonality