S3-CLIP: Video Super Resolution for Person-ReID

📅 2026-01-13

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the challenge of low-quality trajectory fragments in cross-view video person re-identification (ReID), particularly between aerial and ground perspectives. It presents the first systematic investigation into the role of video super-resolution (VSR) for trajectory enhancement, introducing a task-oriented, end-to-end optimization framework that deeply integrates state-of-the-art VSR networks with a CLIP-driven ReID architecture. The proposed method substantially improves the discriminability of degraded trajectories, achieving 37.52% mAP in the aerial-to-ground setting and 29.16% mAP in the ground-to-aerial setting on the VReID-XFD benchmark. Furthermore, it yields notable gains in Rank-1, Rank-5, and Rank-10 accuracy by 11.24%, 13.48%, and 17.98%, respectively, demonstrating the effectiveness and novelty of the proposed approach.

Technology Category

Application Category

📝 Abstract

Tracklet quality is often treated as an afterthought in most person re-identification (ReID) methods, with the majority of research presenting architectural modifications to foundational models. Such approaches neglect an important limitation, posing challenges when deploying ReID systems in real-world, difficult scenarios. In this paper, we introduce S3-CLIP, a video super-resolution-based CLIP-ReID framework developed for the VReID-XFD challenge at WACV 2026. The proposed method integrates recent advances in super-resolution networks with task-driven super-resolution pipelines, adapting them to the video-based person re-identification setting. To the best of our knowledge, this work represents the first systematic investigation of video super-resolution as a means of enhancing tracklet quality for person ReID, particularly under challenging cross-view conditions. Experimental results demonstrate performance competitive with the baseline, achieving 37.52% mAP in aerial-to-ground and 29.16% mAP in ground-to-aerial scenarios. In the ground-to-aerial setting, S3-CLIP achieves substantial gains in ranking accuracy, improving Rank-1, Rank-5, and Rank-10 performance by 11.24%, 13.48%, and 17.98%, respectively.

Problem

Research questions and friction points this paper is trying to address.

person re-identification

tracklet quality

video super-resolution

cross-view

challenging scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

video super-resolution

person re-identification

tracklet enhancement

cross-view ReID

task-driven super-resolution

🔎 Similar Papers

No similar papers found.

Toyota Research Institute

Los Altos, CA

AI Research Scientist, Computer Vision - Facebook Video Intelligence