Ke Zhu
Scholar

Ke Zhu

Google Scholar ID: bos3kG8AAAAJ
Nanjing University
Computer Vision and Pattern Recognition
Citations & Impact
All-time
Citations
396
 
H-index
8
 
i10-index
7
 
Publications
18
 
Co-authors
15
list available
Contact
No contact links provided.
Publications
18 items
Browse publications on Google Scholar (top-right) ↗
Resume (English only)
Academic Achievements
  • 1. Perception and Reasoning Scaling Laws: The Role of RLHF (Working in Progress), In Submission.
  • 2. On Data Synthesis and Post-training for Visual Abstract Reasoning, In Submission.
  • 3. Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception, In Submission.
  • 4. Contiunal SFT Matches multimodal RLHF with Negative Supervision, CVPR2025, To appear.
  • 5. Self-Supervised Visual Preference Alignment, MM2024 oral, 3.97%.
  • 6. Bias Mitigation for Long-Tailed Detection, Submitted to IJCV.
  • 7. Coarse Is Better? A New Pipeline Towards Self-Supervised Learning with Uncurated Images, Pattern Recognition Journal (PR).
  • 8. All You Need in Knowledge Distillation Is a Tailored Coordinate System, AAAI 2025.
  • 9. DiffuLT: How to Make Diffusion Model Useful for Long-tail Recognition, NeurIPS 2024.
  • 10. Rectify the Regression Bias in Long-Tailed Object Detection, ECCV2024.
  • 11. Instance-based Max-margin for Practical Few-shot Recognition, CVPR2024.
  • 12. DTL: Disentangled Transfer Learning for Visual Recognition, AAAI2024, To appear.
  • 13. Multi-Label Self-Supervised Learning with Scene Images, In Proceedings.
Research Experience
  • 1. 2023.6~2024.5: Project on model comprehension for Radio User Allocation, reducing over 95% parameters without accuracy drop, bringing about 5x inference speed to HUAWEI's original architecture.
  • 2. 2023.11~2024.5: Internship under Xiangyu Zhang (Chief Scientist at 阶跃星辰) working on Autoregressive LLM for comprehension and generation, including Multimodal LLM Foundation: Pre-/Post-training, RLHF.
  • 3. 2024.6~2025.5: Internship under Jingdong Wang (Chief Scientist at Baidu Vision) working on Multimodal LLM Post-training: RLHF, SFT, and LLM reasoning, Data Synthesis (CoT).
  • 4. 2024.5~present: Working under Shuai Bai in the Qwen-VL Foundation Model Group, focusing on Post-Training For Qwen-VL.
Education
  • 1. PhD student at the School of Artificial Intelligence, Nanjing University, supervised by Prof. Zhi-Hua Zhou.
  • 2. B.Sc. in Automation Science and Technology from the Department of Electronics and Information, Xi'an Jiaotong University, graduated in June 2020.
Background
  • Research interests include Multimodal LLM and General Computer Vision Tasks. Currently focused on VLM Post-training (RLHF), Data Synthesis, Reasoning.