🤖 AI Summary
Traditional deep networks struggle to flexibly reuse learned knowledge in novel compositional settings due to their reliance on shared, fixed-weight matrices. This work proposes Vector Networks (VN), a hierarchical recurrent architecture that replaces conventional weight matrices with a reusable library of rank-1 weight atoms. For each input, VN constructs sample-specific low-rank weights by sparsely activating atoms through local energy minimization and refines the activation coefficients via residual updates during inference. This design intrinsically embeds compositional generalization into both the architecture and the inference process. Evaluated across four compositional benchmarks, VN matches strong baselines in in-distribution performance while reducing out-of-distribution error rates by nearly an order of magnitude when recombining familiar factors in unseen configurations.
📝 Abstract
Deep networks are powerful function approximators, but they typically store many different computations in shared weight matrices, making it difficult to selectively reuse or adapt parts of them when a familiar structure appears in novel combinations. We introduce the Vector Network (VN), a hierarchical recurrent architecture in which each layer replaces a fixed weight matrix with a library of reusable rank-1 weight atoms. For each input, VN minimizes a layer-local energy to infer a sparse set of active weight atoms and their coefficients, jointly constrained by bottom-up input reconstruction and top-down feedback consistency. These weight atom coefficients then compose an input-specific low-rank weight matrix for that sample. After convergence, slow learning updates only the selected weight atoms through local residual signals scaled by the inferred coefficients. We evaluate VN on four compositional benchmarks spanning 1D signals, 2D spatial decoding, N-body dynamics, and compositional MNIST. VN matches strong baselines in distribution while often achieving out-of-distribution error about an order of magnitude lower when familiar factors must be recombined in novel ways. Vector networks thus make compositional generalization a structural property of the architecture and inference process rather than a brittle byproduct of fitting many behaviors into one shared dense parameter substrate.