Humanoid Everyday: A Comprehensive Robotic Dataset for Open-World Humanoid Manipulation

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing robot learning datasets primarily target stationary manipulators, while humanoid robot datasets suffer from closed environments, narrow task scopes, insufficient human–robot interaction, and limited lower-body motion modalities; moreover, no standardized evaluation platform exists for policy learning benchmarking. To address these limitations, we introduce HR-Open—the first large-scale, multimodal dataset for open-world humanoid robotic manipulation. HR-Open encompasses dexterous manipulation, full-body locomotion, and natural human–robot interaction, collected via efficient teleoperation with synchronized RGB, depth, LiDAR, tactile, and language annotations. It comprises 260 diverse tasks, 10.3k trajectories, and over 3 million frames. Accompanying the dataset is an open-source cloud-based evaluation platform enabling reproducible, generalizable policy learning benchmarks. HR-Open significantly advances embodied intelligence research by unifying data collection, modality coverage, and standardized evaluation for humanoid robotics.

Technology Category

Application Category

📝 Abstract
From loco-motion to dextrous manipulation, humanoid robots have made remarkable strides in demonstrating complex full-body capabilities. However, the majority of current robot learning datasets and benchmarks mainly focus on stationary robot arms, and the few existing humanoid datasets are either confined to fixed environments or limited in task diversity, often lacking human-humanoid interaction and lower-body locomotion. Moreover, there are a few standardized evaluation platforms for benchmarking learning-based policies on humanoid data. In this work, we present Humanoid Everyday, a large-scale and diverse humanoid manipulation dataset characterized by extensive task variety involving dextrous object manipulation, human-humanoid interaction, locomotion-integrated actions, and more. Leveraging a highly efficient human-supervised teleoperation pipeline, Humanoid Everyday aggregates high-quality multimodal sensory data, including RGB, depth, LiDAR, and tactile inputs, together with natural language annotations, comprising 10.3k trajectories and over 3 million frames of data across 260 tasks across 7 broad categories. In addition, we conduct an analysis of representative policy learning methods on our dataset, providing insights into their strengths and limitations across different task categories. For standardized evaluation, we introduce a cloud-based evaluation platform that allows researchers to seamlessly deploy their policies in our controlled setting and receive performance feedback. By releasing Humanoid Everyday along with our policy learning analysis and a standardized cloud-based evaluation platform, we intend to advance research in general-purpose humanoid manipulation and lay the groundwork for more capable and embodied robotic agents in real-world scenarios. Our dataset, data collection code, and cloud evaluation website are made publicly available on our project website.
Problem

Research questions and friction points this paper is trying to address.

Addressing limited task diversity in humanoid robot datasets
Providing standardized evaluation platform for humanoid manipulation policies
Enabling research on locomotion-integrated humanoid manipulation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale humanoid dataset with diverse manipulation tasks
Efficient teleoperation pipeline for multimodal sensory data collection
Cloud-based platform for standardized policy evaluation
🔎 Similar Papers
No similar papers found.
Z
Zhenyu Zhao
University of Southern California
H
Hongyi Jing
University of Southern California
X
Xiawei Liu
University of Southern California
Jiageng Mao
Jiageng Mao
University of Southern California
RoboticsComputer Vision
Abha Jha
Abha Jha
University of Southern California
Computer VisionGenerative AILarge Language Models
H
Hanwen Yang
University of Southern California
R
Rong Xue
University of Southern California
Sergey Zakharov
Sergey Zakharov
Toyota Research Institute
Computer VisionMachine LearningAugmented Reality
V
Vitor Guizilini
Toyota Research Institute
Y
Yue Wang
University of Southern California