Sipeng Zheng
Scholar

Sipeng Zheng

Google Scholar ID: OonuDhcAAAAJ
BeingBeyond
Computer VisionLarge Multimodal ModelEmbodied AI
Citations & Impact
All-time
Citations
506
 
H-index
14
 
i10-index
16
 
Publications
20
 
Co-authors
3
list available
Resume (English only)
Academic Achievements
  • Publications:
  • - Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos
  • - RLPF: Physical Feedback: Aligning Large Motion Models with Humanoid Control
  • - Being-M0.5: A Real-Time Controllable Vision-Language-Motion Model
  • - Being-VL0.5: Unified Multimodal Understanding via Byte-Pair Visual Encoding
  • - Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds
  • - Few-shot Action Recognition with Hierarchical Matching and Contrastive Learning
  • - VRDFormer: End-to-end video visual relation detection with transformer
Research Experience
  • Currently leading the Embodied Multimodal Pretraining team at BeingBeyond, working on projects including Being-H, Being-M, and Being-VL series. Previously, a researcher at Beijing Academy of Artificial Intelligence (BAAI).
Education
  • PhD and bachelor's degrees from Renmin University of China (RUC), supervised by Prof. Qin Jin.
Background
  • Research Interests: Large multimodal models, human behavior and motion understanding, vision-language-action models, humanoid robots. Brief Introduction: A researcher at BeingBeyond, focusing on the development of foundation models for general-purposed humanoid robots.
Miscellany
  • Working towards an intelligent humanoid robot.