Paper accepted to CVPR 2025; submitted a survey and meta-analysis paper on 3D tasks empowered by multimodal large language models to TPAMI; proposed a 3D multimodal model for general 3D learning, Point-Bind, and the first 3D large language model, Point-LLM; proposed a two-stage framework, CapeFormer, for category-agnostic pose estimation; offered an alternative and new solution for continual test-time adaptation with Decorate the Newcomers; alleviated the domain gap caused by mixed fog influence and style variation without labels in Both Style and Fog Matter; proposed a reinforced motion transformation network, REMOTE, for semi-supervised 2D pose estimation in videos.
Research Experience
Currently a member of both the Visual Geometry Group and Active Vision Group at the University of Oxford; previously a full-time researcher at Shanghai AI Lab, under the supervision of Prof. Chao Dong.
Education
Obtained Bachelor's and Master's degrees from Wuhan University in 2018 and 2021; currently a DPhil student at the Department of Engineering Science, University of Oxford, since October 2023, supervised by Prof. Victor Prisacariu and Prof. Iro Laina.
Background
Research interests include LLMs, 3D computer vision, and robotics, especially using LLMs' world knowledge to enhance 3D world understanding and interaction.