Reinforced Interactive Continual Learning via Real-time Noisy Human Feedback

📅 2025-05-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses two critical bottlenecks of continual learning (CL) in realistic human–computer interaction: (1) reliance on static, clean datasets, hindering responsiveness to real-time human annotations; and (2) insufficient robustness to noisy labels. To this end, we propose RiCL—a Reinforced Interactive Continual Learning framework designed for dynamic interaction settings. RiCL introduces a novel three-module synergistic architecture integrating temporal-consistency-based noise purification, interaction-aware direct preference optimization, and noise-robust contrastive learning—marking the first systematic unification of real-time adaptability, interactive responsiveness, and label-noise robustness in CL. Evaluated on the FewRel and TACRED noisy benchmarks, RiCL significantly outperforms state-of-the-art online continual learning and noise-tolerant learning methods, demonstrating superior adaptability to evolving user feedback and strong robustness against label noise.

Technology Category

Application Category

📝 Abstract
This paper introduces an interactive continual learning paradigm where AI models dynamically learn new skills from real-time human feedback while retaining prior knowledge. This paradigm distinctively addresses two major limitations of traditional continual learning: (1) dynamic model updates using streaming, real-time human-annotated data, rather than static datasets with fixed labels, and (2) the assumption of clean labels, by explicitly handling the noisy feedback common in real-world interactions. To tackle these problems, we propose RiCL, a Reinforced interactive Continual Learning framework leveraging Large Language Models (LLMs) to learn new skills effectively from dynamic feedback. RiCL incorporates three key components: a temporal consistency-aware purifier to automatically discern clean from noisy samples in data streams; an interaction-aware direct preference optimization strategy to align model behavior with human intent by reconciling AI-generated and human-provided feedback; and a noise-resistant contrastive learning module that captures robust representations by exploiting inherent data relationships, thus avoiding reliance on potentially unreliable labels. Extensive experiments on two benchmark datasets (FewRel and TACRED), contaminated with realistic noise patterns, demonstrate that our RiCL approach substantially outperforms existing combinations of state-of-the-art online continual learning and noisy-label learning methods.
Problem

Research questions and friction points this paper is trying to address.

Dynamic model updates using real-time human feedback
Handling noisy human feedback in continual learning
Aligning model behavior with human intent effectively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal consistency-aware purifier filters noisy samples
Interaction-aware DPO aligns model with human intent
Noise-resistant contrastive learning captures robust representations
🔎 Similar Papers
No similar papers found.
Y
Yutao Yang
School of Computer Science and Technology, East China Normal University
J
Jie Zhou
School of Computer Science and Technology, East China Normal University
Junsong Li
Junsong Li
East China Normal University
NLPLLMNLI
Qianjun Pan
Qianjun Pan
East China Normal University
LLM
Bihao Zhan
Bihao Zhan
East China Normal University
CLLLMRAGKG
Q
Qin Chen
School of Computer Science and Technology, East China Normal University
X
Xipeng Qiu
Shanghai Innovation Institute, School of Computer Science, Fudan University
L
Liang He
School of Computer Science and Technology, East China Normal University