Tutor-Student Reinforcement Learning: A Dynamic Curriculum for Robust Deepfake Detection

📅 2026-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Standard supervised training treats all samples equally in deepfake detection, hindering the learning of robust and generalizable features. To address this limitation, this work proposes a Teacher-Student Reinforcement Learning (TSRL) framework that, for the first time, formulates dynamic curriculum learning as a Markov Decision Process. In this framework, a teacher agent constructs state representations by integrating sample-wise historical dynamics—such as exponentially moving averaged (EMA) loss and forgetting counts—with visual features, and adaptively adjusts per-sample loss weights through Proximal Policy Optimization (PPO) in a continuous action space. By prioritizing high-value samples, TSRL significantly enhances model generalization to unseen forgery techniques, demonstrating the efficacy of reinforcement learning–driven adaptive curriculum design in deepfake detection.

Technology Category

Application Category

📝 Abstract
Standard supervised training for deepfake detection treats all samples with uniform importance, which can be suboptimal for learning robust and generalizable features. In this work, we propose a novel Tutor-Student Reinforcement Learning (TSRL) framework to dynamically optimize the training curriculum. Our method models the training process as a Markov Decision Process where a ``Tutor'' agent learns to guide a ``Student'' (the deepfake detector). The Tutor, implemented as a Proximal Policy Optimization (PPO) agent, observes a rich state representation for each training sample, encapsulating not only its visual features but also its historical learning dynamics, such as EMA loss and forgetting counts. Based on this state, the Tutor takes an action by assigning a continuous weight (0-1) to the sample's loss, thereby dynamically re-weighting the training batch. The Tutor is rewarded based on the Student's immediate performance change, specifically rewarding transitions from incorrect to correct predictions. This strategy encourages the Tutor to learn a curriculum that prioritizes high-value samples, such as hard-but-learnable examples, leading to a more efficient and effective training process. We demonstrate that this adaptive curriculum improves the Student's generalization capabilities against unseen manipulation techniques compared to traditional training methods. Code is available at https://github.com/wannac1/TSRL.
Problem

Research questions and friction points this paper is trying to address.

deepfake detection
curriculum learning
robustness
generalization
supervised training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tutor-Student Reinforcement Learning
Dynamic Curriculum Learning
Deepfake Detection
Proximal Policy Optimization
Adaptive Sample Weighting
🔎 Similar Papers
No similar papers found.
Z
Zhanhe Lei
School of Computer Science, Wuhan University
Zhongyuan Wang
Zhongyuan Wang
Wuhan University
J
Jikang Cheng
School of Integrated Circuits, Peking University
B
Baojin Huang
School of Information, Huazhong Agricultural University
Y
Yuhong Yang
School of Computer Science, Wuhan University
Z
Zhen Han
School of Computer Science, Wuhan University
Chao Liang
Chao Liang
Professor of Computer Science, Wuhan University
computer visionpattern recognitionmultimediaHCI
Dengpan Ye
Dengpan Ye
Wuhan University
mulitimedia、security