Are Greedy Task Orderings Better Than Random in Continual Linear Regression?

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates task ordering in continual learning for linear regression, focusing on how sequence choice affects convergence behavior and average loss. We propose a greedy task-ordering strategy that maximizes inter-task discrepancy and analyze it using geometric and algebraic tools from the Kaczmarz method, under both high-rank and general-rank data assumptions. Theoretically, we show that single-pass greedy ordering may catastrophically fail, whereas multi-pass greedy ordering achieves an $O(1/sqrt[3]{k})$ convergence rate—establishing, for the first time, a rigorous theoretical separation from random ordering. Empirically, our strategy significantly accelerates convergence and reduces average loss on both synthetic benchmarks and linear probing tasks over CIFAR-100 features, consistently matching or outperforming random ordering.

Technology Category

Application Category

📝 Abstract
We analyze task orderings in continual learning for linear regression, assuming joint realizability of training data. We focus on orderings that greedily maximize dissimilarity between consecutive tasks, a concept briefly explored in prior work but still surrounded by open questions. Using tools from the Kaczmarz method literature, we formalize such orderings and develop geometric and algebraic intuitions around them. Empirically, we demonstrate that greedy orderings converge faster than random ones in terms of the average loss across tasks, both for linear regression with random data and for linear probing on CIFAR-100 classification tasks. Analytically, in a high-rank regression setting, we prove a loss bound for greedy orderings analogous to that of random ones. However, under general rank, we establish a repetition-dependent separation. Specifically, while prior work showed that for random orderings, with or without replacement, the average loss after $k$ iterations is bounded by $mathcal{O}(1/sqrt{k})$, we prove that single-pass greedy orderings may fail catastrophically, whereas those allowing repetition converge at rate $mathcal{O}(1/sqrt[3]{k})$. Overall, we reveal nuances within and between greedy and random orderings.
Problem

Research questions and friction points this paper is trying to address.

Analyzing task ordering impact on continual linear regression convergence rates
Comparing greedy dissimilarity-based orderings with random task sequences
Establishing theoretical bounds for different ordering strategies in learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Greedy orderings maximize dissimilarity between consecutive tasks
Using Kaczmarz method tools for geometric and algebraic formalization
Analyzing convergence rates with and without task repetition
🔎 Similar Papers
No similar papers found.
M
Matan Tsipory
Technion, Haifa
R
Ran Levinstein
Technion, Haifa
I
Itay Evron
Technion, Haifa
M
Mark Kong
University of California, Los Angeles
Deanna Needell
Deanna Needell
Professor of Mathematics, UCLA
Mathematical signal processingstatisticscompressed sensingnumerical linear algebra
Daniel Soudry
Daniel Soudry
Associate Professor
Neural NetworksMachine LearningTheoretical neuroscience