Scholar

Sipeng Zheng

Google Scholar ID: OonuDhcAAAAJ

BeingBeyond

Computer VisionLarge Multimodal ModelEmbodied AI

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

506

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailzhengsipeng27@gmail.com CVOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

21 items

Being-H0.7: A Latent World-Action Model from Egocentric Videos

2026

Cited

Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models

2026

Cited

OpenT2M: No-frill Motion Generation with Open-source,Large-scale, High-quality Data

2026

Cited

Conservative Offline Robot Policy Learning via Posterior-Transition Reweighting

2026

Cited

Joint-Aligned Latent Action: Towards Scalable VLA Pretraining in the Wild

2026

Cited

Rethinking Visual-Language-Action Model Scaling: Alignment, Mixture, and Regularization

2026

Cited

Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization

2026

Cited

Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos

2025

Cited

Resume (English only)

Academic Achievements

Publications:
- Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos
- RLPF: Physical Feedback: Aligning Large Motion Models with Humanoid Control
- Being-M0.5: A Real-Time Controllable Vision-Language-Motion Model
- Being-VL0.5: Unified Multimodal Understanding via Byte-Pair Visual Encoding
- Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds
- Few-shot Action Recognition with Hierarchical Matching and Contrastive Learning
- VRDFormer: End-to-end video visual relation detection with transformer

Research Experience

Currently leading the Embodied Multimodal Pretraining team at BeingBeyond, working on projects including Being-H, Being-M, and Being-VL series. Previously, a researcher at Beijing Academy of Artificial Intelligence (BAAI).

Education

PhD and bachelor's degrees from Renmin University of China (RUC), supervised by Prof. Qin Jin.

Background

Research Interests: Large multimodal models, human behavior and motion understanding, vision-language-action models, humanoid robots. Brief Introduction: A researcher at BeingBeyond, focusing on the development of foundation models for general-purposed humanoid robots.

Miscellany

Working towards an intelligent humanoid robot.

Co-authors

3 total

Qin Jin

中国人民大学信息学院

Zongqing Lu

Peking University | BeingBeyond

Shizhe Chen

INRIA Paris