Scholar

Zhisheng Zhong

Google Scholar ID: u-2_7C8AAAAJ

The Chinese University of Hong Kong

Computer VisionMachine LearningMultimodal AIData Efficiency

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

3,374

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailzszhong@link.cuhk.edu.hk CVOpen ↗GitHubOpen ↗

Publications

7 items

Claim-Level Rubric Rewards for Video Caption Reinforcement Learning

2026

Cited

ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for Large Vision-and-Language Models

2025

Cited

MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech

2025

Cited

ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay

2025

Cited

VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning

2025

Cited

STEVE: AStep Verification Pipeline for Computer-use Agent Training

2025

Cited

Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement

2025

Cited

Resume (English only)

Academic Achievements

Published multiple papers in top-tier international conferences and journals, including ICCV, NeurIPS, CVPR, etc. Specific publications include:
- (Mini-Gemini V3) MGM-Omni: An Open-Source Omni Chatbot
- (Mini-Gemini V2) Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
- Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement
- ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay
- Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
- Decoupled Kullback-Leibler Divergence Loss
- Understanding Imbalanced Semantic Segmentation Through Neural Collapse
- Rebalanced Siamese Contrastive Mining for Long-Tailed Recognition
- Improving Calibration for Long-Tailed Recognition
- Channel-level Variable Quantization Network for Deep Image Compression
- Deep Joint-semantics Reconstructing Hashing for Large-scale Unsupervised Cross-modal Retrieval
- ADA-Tucker: Compressing Deep Neural Networks via Adaptive Dimension Adjustment Tucker Decomposition
- Joint Sub-bands Learning with Clique Structures for Wavelet Domain Super-Resolution

Research Experience

Looking for a full-time position starting in Fall 2025. Feel free to drop an email if you are recruiting!

Education

Pursuing a Ph.D. at the Department of Computer Science and Engineering, The Chinese University of Hong Kong, supervised by Prof. Jiaya Jia; Master's degree in Intelligence Science from Peking University, supervised by Prof. Zhouchen Lin and Prof. Chao Zhang; Bachelor's degree in Communication Engineering from Beijing University of Posts and Telecommunications (BUPT).

Background

Currently a PhD student at the Department of Computer Science and Engineering, The Chinese University of Hong Kong (CUHK), focusing on computer vision and machine learning, particularly in multimodal AI (MLLM, VLM), data efficiency, imbalanced learning, and 2D/3D segmentation.

Miscellany