Qidong Huang
Scholar

Qidong Huang

Google Scholar ID: F-OzLhQAAAAJ
Qwen Team, Alibaba Cloud
vision and language
Citations & Impact
All-time
Citations
1,038
 
H-index
13
 
i10-index
13
 
Publications
20
 
Co-authors
9
list available
Resume (English only)
Academic Achievements
  • “ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing” accepted by ICCV 2025 (equal contribution)
  • “Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate” accepted by ICCV 2025
  • “OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation” accepted by CVPR 2024 (Highlight, top 2.8%)
  • “PointCAT: Contrastive Adversarial Training for Robust Point Cloud Recognition” published in IEEE TIP 2024
  • “Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting” accepted by ICCV 2023
  • “Diversity-Aware Meta Visual Prompting” accepted by CVPR 2023
  • Contributed to multiple open-source projects and datasets, including ScaleCap (450k high-quality long image captions), MMRC (large-scale real-world MLLM conversation benchmark), Light-A-Video (training-free video relighting), and PyramidDrop (accelerating LVLM training/inference)