- Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies (Under Review, 2025)
- G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation (CVPR, 2025)
- Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots (NAACL, 2025)
- HyCodePolicy: Hybrid Language Controllers for Multimodal Monitoring and Decision in Embodied Agents (ICCV @ MMR Workshop, 2025)
- VER: Vision Expert Transformer for Robot Learning via Foundation Distillation and Dynamic Routing (Under Review, 2025)
- RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation (Under Review, 2025)
- RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (CVPR, 2025, ECCV @ MAAS Workshop, 2024, Best Paper Award)
- SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution (CVPR, 2024)
Research Experience
- Visiting scholar, 2024-2025, UC Berkeley MSC Lab, under the guidance of Prof. Masayoshi Tomizuka
- Collaborated closely with Dr. Yao Mu (now Assistant Professor at SJTU) and Dr. Mingyu Ding (now Assistant Professor at UNC Chapel Hill)
Education
- Ph.D. student, 2022-present, Department of Computer Science, The University of Hong Kong, advised by Prof. Ping Luo and Prof. Wenping Wang (IEEE & ACM Fellow)
- Bachelor's degree, 2021, Automation, Zhejiang University, first place, honor degree from Chu Kochen Honors College
- Research assistant, 2020, VIL, University of Washington, supervised by Yin Guo and Prof. Chun Yuan
Background
Research interests: generative model for embodied AI, robot learning, and multimodal foundation models.