Zhenghao Xing
Scholar

Zhenghao Xing

Google Scholar ID: QsEmwSYAAAAJ
The Chinese University of Hong Kong
Multimodal LearningComputer Vision
Citations & Impact
All-time
Citations
39
 
H-index
4
 
i10-index
1
 
Publications
7
 
Co-authors
8
list available
Resume (English only)
Academic Achievements
  • Publications:
  • - Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception
  • - Qwen3-Omni Technical Report
  • - EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning
  • - EchoTraffic: Enhancing Traffic Anomaly Understanding with Audio-Visual Insights
  • - Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LMs via Catfish Agent for Clinical Decision Making
  • - Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era
  • - Video Instance Shadow Detection Under the Sun and Sky
  • Conference and Journal Acceptances: NeurIPS 2025, CVPR 2025, IEEE TIP 2024
Research Experience
  • Work Experience: Algorithm Engineer at Tencent AI Lab, responsible for developing RL-based game agents and designing diverse reward strategies to encourage varied playstyles.
Education
  • Ph.D.: The Chinese University of Hong Kong, advisors Prof. Pheng-Ann Heng and Prof. Chi-Wing Fu; M.Sc. in Big Data Technology from the Hong Kong University of Science and Technology; B.Sc. (Hons) in Computer Science and Technology from Beijing Normal–Hong Kong Baptist University.
Background
  • Research Interests: Multimodal understanding and reasoning. Background: Currently a third-year Ph.D. student at the Chinese University of Hong Kong, advised by Prof. Pheng-Ann Heng and Prof. Chi-Wing Fu. Before his Ph.D., he worked as an algorithm engineer at Tencent AI Lab, where he developed RL-based game agents and designed diverse reward strategies to encourage varied playstyles.
Miscellany
  • Personal Interests: Basketball, tennis, and regular gym workouts—sports define my mindset.