Transfer learning via Regularized Linear Discriminant Analysis

📅 2025-01-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses transfer learning classification under high-dimensional, low-sample-size settings (p ≫ n), where labeled source-domain data is leveraged to improve discriminative performance in a target domain with limited labels. Method: We propose Regularized Random Effects Linear Discriminant Analysis (RRE-LDA), introducing a novel weighted fusion of ridge-regularized estimators from source and target domains to learn discriminant directions. Under the high-dimensional asymptotic regime p/n → ∞, we derive, for the first time, the closed-form optimal weight and the corresponding classification error rate, accompanied by geometric interpretation and practical weight selection guidelines. Contributions/Results: Theoretical analysis rests on random matrix theory and risk minimization. Extensive experiments—including synthetic data and a real-world proteomics-based task for ten-year cardiovascular disease risk prediction—demonstrate substantial error reduction. Crucially, the theoretically derived error bound aligns closely with empirical results, validating both the framework’s statistical soundness and practical efficacy.

Technology Category

Application Category

📝 Abstract
Linear discriminant analysis is a widely used method for classification. However, the high dimensionality of predictors combined with small sample sizes often results in large classification errors. To address this challenge, it is crucial to leverage data from related source models to enhance the classification performance of a target model. We propose to address this problem in the framework of transfer learning. In this paper, we present novel transfer learning methods via regularized random-effects linear discriminant analysis, where the discriminant direction is estimated as a weighted combination of ridge estimates obtained from both the target and source models. Multiple strategies for determining these weights are introduced and evaluated, including one that minimizes the estimation risk of the discriminant vector and another that minimizes the classification error. Utilizing results from random matrix theory, we explicitly derive the asymptotic values of these weights and the associated classification error rates in the high-dimensional setting, where $p/n ightarrow infty$, with $p$ representing the predictor dimension and $n$ the sample size. We also provide geometric interpretations of various weights and a guidance on which weights to choose. Extensive numerical studies, including simulations and analysis of proteomics-based 10-year cardiovascular disease risk classification, demonstrate the effectiveness of the proposed approach.
Problem

Research questions and friction points this paper is trying to address.

High-dimensional data
Linear Discriminant Analysis
Transfer learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Regularized Linear Discriminant Analysis
Transfer Learning Framework
High-Dimensional Small Sample Classification
🔎 Similar Papers
No similar papers found.