Publications include 'Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning' (arXiv, 2025), 'UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding' (CVPRW, 2025), 'PointHR: Exploring High-Resolution Architectures for 3D Point Cloud Segmentation' (arXiv, 2023), 'Collect-and-Distribute Transformer for 3D Point Cloud Analysis' (arXiv, 2023), 'GFNet: Geometric Flow Network for 3D Point Cloud Semantic Segmentation' (TMLR, 2022), 'SynFace: Face Recognition with Synthetic Data' (ICCV, 2021), 'End2End Occluded Face Recognition by Masking Corrupted Features' (TPAMI, 2021), 'Cross View Fusion for 3D Human Pose Estimation' (ICCV, 2019), 'Learning Basis Representation to Refine 3D Human Pose Estimations' (AAAI, 2019). Conference and journal reviewer for multiple international conferences and journals.
Research Experience
2024.04 - Present: Meituan Large Multimodal Model Group, Multimodal Researcher; 2021.04 - 2022.04: JD Explore Academy, Research intern, Advised by Dr. Baosheng Yu; 2019.05 - 2021.03: Tencent AI Lab, Research intern, Advised by Dr. Dihong Gong, Dr. Zhifeng Li, and Dr. Wei Liu; 2017.07 - 2018.12: Microsoft Research Asia (MSRA), Research intern, Advised by Dr. Chunyu Wang and Prof. Wenjun Zeng.
Education
Received PhD degree from the School of Computer Science, University of Sydney, advised by Prof. Dacheng Tao and co-supervised by Prof. Baosheng Yu. Obtained Bachelor's degree in the Department of Electronic Engineering and Information Science from the University of Science and Technology of China (USTC).
Background
Currently working at Meituan as a Researcher. Research interests include multi-modality learning, with a particular focus on mllm post-training, unified multimodal understanding and generation, and multimodal reasoning model.
Miscellany
Website last updated in July 2025. Modified from Leonid Keselman's and Jon Barron's websites.