- Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception
- Qwen3-Omni Technical Report
- EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning
- EchoTraffic: Enhancing Traffic Anomaly Understanding with Audio-Visual Insights
- Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LMs via Catfish Agent for Clinical Decision Making
- Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era
- Video Instance Shadow Detection Under the Sun and Sky
Conference and Journal Acceptances: NeurIPS 2025, CVPR 2025, IEEE TIP 2024
Research Experience
Work Experience: Algorithm Engineer at Tencent AI Lab, responsible for developing RL-based game agents and designing diverse reward strategies to encourage varied playstyles.
Education
Ph.D.: The Chinese University of Hong Kong, advisors Prof. Pheng-Ann Heng and Prof. Chi-Wing Fu; M.Sc. in Big Data Technology from the Hong Kong University of Science and Technology; B.Sc. (Hons) in Computer Science and Technology from Beijing Normal–Hong Kong Baptist University.
Background
Research Interests: Multimodal understanding and reasoning. Background: Currently a third-year Ph.D. student at the Chinese University of Hong Kong, advised by Prof. Pheng-Ann Heng and Prof. Chi-Wing Fu. Before his Ph.D., he worked as an algorithm engineer at Tencent AI Lab, where he developed RL-based game agents and designed diverse reward strategies to encourage varied playstyles.
Miscellany
Personal Interests: Basketball, tennis, and regular gym workouts—sports define my mindset.