Published multiple papers in top-tier international conferences and journals, including ICCV, NeurIPS, CVPR, etc. Specific publications include:
- (Mini-Gemini V3) MGM-Omni: An Open-Source Omni Chatbot
- (Mini-Gemini V2) Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
- Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement
- ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay
- Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
- Decoupled Kullback-Leibler Divergence Loss
- Understanding Imbalanced Semantic Segmentation Through Neural Collapse
- Rebalanced Siamese Contrastive Mining for Long-Tailed Recognition
- Improving Calibration for Long-Tailed Recognition
- Channel-level Variable Quantization Network for Deep Image Compression
- Deep Joint-semantics Reconstructing Hashing for Large-scale Unsupervised Cross-modal Retrieval
- ADA-Tucker: Compressing Deep Neural Networks via Adaptive Dimension Adjustment Tucker Decomposition
- Joint Sub-bands Learning with Clique Structures for Wavelet Domain Super-Resolution
Research Experience
Looking for a full-time position starting in Fall 2025. Feel free to drop an email if you are recruiting!
Education
Pursuing a Ph.D. at the Department of Computer Science and Engineering, The Chinese University of Hong Kong, supervised by Prof. Jiaya Jia; Master's degree in Intelligence Science from Peking University, supervised by Prof. Zhouchen Lin and Prof. Chao Zhang; Bachelor's degree in Communication Engineering from Beijing University of Posts and Telecommunications (BUPT).
Background
Currently a PhD student at the Department of Computer Science and Engineering, The Chinese University of Hong Kong (CUHK), focusing on computer vision and machine learning, particularly in multimodal AI (MLLM, VLM), data efficiency, imbalanced learning, and 2D/3D segmentation.
Miscellany
I am looking for a full-time position starting in Fall 2025. Feel free to drop an email if you are recruiting!