🤖 AI Summary
Video recognition systems face risks of unauthorized reuse of training data, yet existing copyright auditing methods—primarily designed for static images—struggle with video’s temporal complexity and the need for stealthy, imperceptible auditing signals. This paper introduces the first copyright auditing framework tailored for video domains. Its core innovation is a temporally sensitive, lightweight sample perturbation mechanism: modifying only 1% of training samples significantly amplifies behavioral discrepancies between released and unreleased samples, yielding highly discriminative and robust “behavioral fingerprints.” The method incorporates three key components: (i) temporal consistency-constrained perturbations, (ii) output divergence modeling, and (iii) adversarial robustness enhancement. Evaluated on mainstream architectures (I3D, SlowFast) and benchmarks (Kinetics, Something-Something), the framework achieves >98% auditing accuracy and demonstrates strong robustness against common distortions—including frame dropping, compression, and fine-tuning.
📝 Abstract
Video recognition systems are increasingly being deployed in daily life, such as content recommendation and security monitoring. To enhance video recognition development, many institutions have released high-quality public datasets with open-source licenses for training advanced models. At the same time, these datasets are also susceptible to misuse and infringement. Dataset copyright auditing is an effective solution to identify such unauthorized use. However, existing dataset copyright solutions primarily focus on the image domain; the complex nature of video data leaves dataset copyright auditing in the video domain unexplored. Specifically, video data introduces an additional temporal dimension, which poses significant challenges to the effectiveness and stealthiness of existing methods.
In this paper, we propose VICTOR, the first dataset copyright auditing approach for video recognition systems. We develop a general and stealthy sample modification strategy that enhances the output discrepancy of the target model. By modifying only a small proportion of samples (e.g., 1%), VICTOR amplifies the impact of published modified samples on the prediction behavior of the target models. Then, the difference in the model's behavior for published modified and unpublished original samples can serve as a key basis for dataset auditing. Extensive experiments on multiple models and datasets highlight the superiority of VICTOR. Finally, we show that VICTOR is robust in the presence of several perturbation mechanisms to the training videos or the target models.