Published several papers including 'VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges' (ICCV 2025), 'OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts' (CVPR 2025), 'Efficient Temporal Extrapolation of Multimodal Large Language Models with Temporal Grounding Bridge' (EMNLP 2024), and more. Involved in multiple open-source projects such as Open-Omni-Nexus, Multimodal Needle In A Video Haystack, etc. Reviewer: ARR 2023-Present; Area Chair: ARR 2024-Present; Organizer: NLPCC 2022 Shared Task 4, NLPCC 2023 Shared Task 10.
Research Experience
Has had the experience of working with Zilong Zheng @ BIGAI, Cihang Xie @ UCSC, and Alan L. Yuille @ JHU.
Education
Master's degree from Peking University, supervised by Dongyan Zhao.
Background
Currently a research engineer at the Qwen team, Alibaba Inc. His current work primarily focuses on omni-LMs and is especially interested in studies that offer novel insights and impactful applications.
Miscellany
Looking for interns for omni-LM and open-world modeling research.