🤖 AI Summary
Addressing model generalization under high annotation costs and data scarcity, this work establishes a unified theoretical and methodological framework for low-resource learning. Theoretically, it introduces the first agnostic active sampling theory integrated into the PAC learning framework, rigorously characterizing the trade-off between generalization error and labeling complexity. Methodologically, it proposes four novel optimization mechanisms—gradient-aware sampling, meta-iterative optimization, manifold-geometric modeling, and large-model-driven augmentation—and synergistically integrates transfer learning, reinforcement-based feedback, and hierarchical structural modeling. Empirical evaluation demonstrates that the proposed approach significantly enhances model robustness and generalization performance under limited annotations. This work provides both an interpretable, scalable theoretical foundation and a practical paradigm for data-constrained AI systems.
📝 Abstract
Learning with high-resource data has demonstrated substantial success in artificial intelligence (AI); however, the costs associated with data annotation and model training remain significant. A fundamental objective of AI research is to achieve robust generalization with limited-resource data. This survey employs agnostic active sampling theory within the Probably Approximately Correct (PAC) framework to analyze the generalization error and label complexity associated with learning from low-resource data in both model-agnostic supervised and unsupervised settings. Based on this analysis, we investigate a suite of optimization strategies tailored for low-resource data learning, including gradient-informed optimization, meta-iteration optimization, geometry-aware optimization, and LLMs-powered optimization. Furthermore, we provide a comprehensive overview of multiple learning paradigms that can benefit from low-resource data, including domain transfer, reinforcement feedback, and hierarchical structure modeling. Finally, we conclude our analysis and investigation by summarizing the key findings and highlighting their implications for learning with low-resource data.