Scholar

Ke Zhu

Google Scholar ID: bos3kG8AAAAJ

Nanjing University

Computer Vision and Pattern Recognition

Citations & Impact

All-time

Citations

396

H-index

i10-index

Publications

Co-authors

list available

Contact

No contact links provided.

Publications

18 items

Browse publications on Google Scholar (top-right) ↗

Resume (English only)

Academic Achievements

1. Perception and Reasoning Scaling Laws: The Role of RLHF (Working in Progress), In Submission.
2. On Data Synthesis and Post-training for Visual Abstract Reasoning, In Submission.
3. Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception, In Submission.
4. Contiunal SFT Matches multimodal RLHF with Negative Supervision, CVPR2025, To appear.
5. Self-Supervised Visual Preference Alignment, MM2024 oral, 3.97%.
6. Bias Mitigation for Long-Tailed Detection, Submitted to IJCV.
7. Coarse Is Better? A New Pipeline Towards Self-Supervised Learning with Uncurated Images, Pattern Recognition Journal (PR).
8. All You Need in Knowledge Distillation Is a Tailored Coordinate System, AAAI 2025.
9. DiffuLT: How to Make Diffusion Model Useful for Long-tail Recognition, NeurIPS 2024.
10. Rectify the Regression Bias in Long-Tailed Object Detection, ECCV2024.
11. Instance-based Max-margin for Practical Few-shot Recognition, CVPR2024.
12. DTL: Disentangled Transfer Learning for Visual Recognition, AAAI2024, To appear.
13. Multi-Label Self-Supervised Learning with Scene Images, In Proceedings.

Research Experience

1. 2023.6~2024.5: Project on model comprehension for Radio User Allocation, reducing over 95% parameters without accuracy drop, bringing about 5x inference speed to HUAWEI's original architecture.
2. 2023.11~2024.5: Internship under Xiangyu Zhang (Chief Scientist at 阶跃星辰) working on Autoregressive LLM for comprehension and generation, including Multimodal LLM Foundation: Pre-/Post-training, RLHF.
3. 2024.6~2025.5: Internship under Jingdong Wang (Chief Scientist at Baidu Vision) working on Multimodal LLM Post-training: RLHF, SFT, and LLM reasoning, Data Synthesis (CoT).
4. 2024.5~present: Working under Shuai Bai in the Qwen-VL Foundation Model Group, focusing on Post-Training For Qwen-VL.

Education

1. PhD student at the School of Artificial Intelligence, Nanjing University, supervised by Prof. Zhi-Hua Zhou.
2. B.Sc. in Automation Science and Technology from the Department of Electronics and Information, Xi'an Jiaotong University, graduated in June 2020.

Background

Research interests include Multimodal LLM and General Computer Vision Tasks. Currently focused on VLM Post-training (RLHF), Data Synthesis, Reasoning.

Co-authors

15 total