Siteng Huang
Scholar

Siteng Huang

Google Scholar ID: mhpkWSYAAAAJ
Alibaba DAMO Academy | ZJU | Westlake University
Vision-language ModelsGenerative ModelsEmbodied AI
Citations & Impact
All-time
Citations
1,275
 
H-index
18
 
i10-index
24
 
Publications
20
 
Co-authors
13
list available
Resume (English only)
Academic Achievements
  • Published 20+ papers on the above topics at top-tier international AI conferences and journals. Recent achievements include:
  • - 4 papers accepted for AAAI 2026, including training-free MLLM inference acceleration methods FiCoCo and GlobalCom2, dexterous grasping policy AffordDex, and tiny-scale VLA VLA-Adapter.
  • - Released RoboSimGS, a novel Real2Sim2Real framework that converts multi-view real-world images into scalable, high-fidelity, and physically interactive simulation environments for robotic manipulation.
  • - SSR got accepted for NeurIPS 2025, which transforms raw depth data into structured, interpretable textual CoT, enhancing spatial reasoning capabilities of MLLMs.
  • - Released VLA-Adapter, which reduces reliance on large-scale VLMs and extensive pre-training by using a lightweight Policy module with Bridge Attention.
  • - Released AffordDex, a universal grasping policy for dexterous hands with an inherent understanding of both motion priors and object affordances.
  • - Open-sourced RynnEC, RynnVLA-001, and RynnRCP, respectively a video MLLM for embodied cognition tasks, a VLA model based on pretrained video generation model, and a complete set of robot service agreements and frameworks.
  • - Long-VLA got accepted for CoRL 2025, a novel framework designed to enhance VLA models for challenging long-horizon robotic manipulation tasks.
Research Experience
  • Works as an Algorithm Expert at DAMO Academy. Spent internship time at TongYi Lab, Alibaba Group during his Ph.D. study. Supervised several self-motivated visiting students and research assistants in their research and publications. Maintains close cooperation with MiLAB from Westlake University.
Education
  • Received a Ph.D. degree from Zhejiang University in June 2024, affiliated with a joint program with Westlake University at Machine Intelligence Laboratory (MiLAB), advised by Prof. Donglin Wang. Received a B.Eng. Degree from School of Computer Science, Wuhan University in June 2019.
Background
  • Research interests include Embodied AI, Multi-modal Large Models, and EfficientAI. Focused on the perception, understanding, reasoning, and generation of multimodal data (including images, videos, language, dynamics, etc.) from both the internet and the physical world, and also on efficientAI for multimodal applications.
Miscellany
  • Email: siteng.huang@gmail.com
  • Links to Twitter, DBLP, GitHub, Google Scholar, ORCID, etc.
  • Open to any form of academic cooperation