PruneVid: Visual Token Pruning for Efficient Video Large Language Models, ACL Findings 2025
Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective, CVPR 2025
Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting, IJCV 2025
Semi-Supervised Spoken Language Glossification, ACL 2024
StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond, arXiv preprint 2024
FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition, ICLR 2024
HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception, NeurIPS 2023
Sign Language Translation with Iterative Prototype, ICCV 2023
Graph Contrastive Learning for Skeleton-based Action Recognition, ICLR 2023
Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition and Translation, IEEE Transactions on Multimedia 2021
Improving Sign Language Translation with Monolingual Data by Sign Back-Translation, CVPR 2021
Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition, AAAI 2020
Dynamic Pseudo Label Decoding for Continuous Sign Language Recognition, ICME 2019
Research Experience
Researcher at Bytedance, working on generative AI in the ads tech and creative industry.
Education
Ph.D. from University of Science and Technology of China (USTC) in 2022, supervised by Prof. Wengang Zhou and Prof. Houqiang Li; B.S. from Xidian University (XDU) in 2017.
Background
Currently a researcher at Bytedance, focusing on developing generative AI in the ads tech and creative industry. Research interests include computer vision, particularly video understanding, generation, and editing.
Miscellany
Invited reviewer for several journals and conferences, including IEEE TPAMI, TCSVT, TMM, etc.