AoE: Always-on Egocentric Human Video Collection for Embodied AI

📅 2026-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the scarcity of high-quality, large-scale real-world interaction data for embodied AI training, a challenge exacerbated by the high cost, strong hardware dependencies, and limited scalability of existing approaches. To overcome these limitations, we propose a lightweight, decentralized first-person data collection paradigm in which users wear an ergonomic smartphone mount and leverage a cross-platform mobile application to capture video anytime and anywhere. Our system integrates on-device real-time computation with a cloud-edge collaborative architecture that enables automatic annotation and filtering in the cloud, facilitating low-cost, scene-agnostic, and continuous data acquisition. The resulting large-scale real-world dataset substantially enhances model generalization on downstream tasks, demonstrating the efficiency, scalability, and practicality of our approach.

Technology Category

Application Category

📝 Abstract
Embodied foundation models require large-scale, high-quality real-world interaction data for pre-training and scaling. However, existing data collection methods suffer from high infrastructure costs, complex hardware dependencies, and limited interaction scope, making scalable expansion challenging. In fact, humans themselves are ideal physically embodied agents. Therefore, obtaining egocentric real-world interaction data from globally distributed"human agents"offers advantages of low cost and sustainability. To this end, we propose the Always-on Egocentric (AoE) data collection system, which aims to simplify hardware dependencies by leveraging humans themselves and their smartphones, enabling low-cost, highly efficient, and scene-agnostic real-world interaction data collection to address the challenge of data scarcity. Specifically, we first employ an ergonomic neck-mounted smartphone holder to enable low-barrier, large-scale egocentric data collection through a cloud-edge collaborative architecture. Second, we develop a cross-platform mobile APP that leverages on-device compute for real-time processing, while the cloud hosts automated labeling and filtering pipelines that transform raw videos into high-quality training data. Finally, the AoE system supports distributed Ego video data collection by anyone, anytime, and anywhere. We evaluate AoE on data preprocessing quality and downstream tasks, demonstrating that high-quality egocentric data significantly boosts real-world generalization.
Problem

Research questions and friction points this paper is trying to address.

Embodied AI
egocentric video
data collection
real-world interaction
scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

egocentric vision
embodied AI
cloud-edge collaboration
smartphone-based sensing
distributed data collection
🔎 Similar Papers
No similar papers found.
B
Bowen Yang
Ant Digital Technologies, Ant Group
Zishuo Li
Zishuo Li
Tsinghua University
Control TheoryOptimizationRoboticsSignal Processing
Y
Yang Sun
Ant Digital Technologies, Ant Group
Changtao Miao
Changtao Miao
University of Science and Technology of China
AI
Y
Yifan Yang
Institute of Automation, Chinese Academy of Sciences
Man Luo
Man Luo
蚂蚁集团视觉算法工程师
计算机视觉、深度学习、生物核身
X
Xiaotong Yan
Ant Digital Technologies, Ant Group
F
Feng Jiang
Ant Digital Technologies, Ant Group
J
Jinchuan Shi
Zhejiang University
Y
Yankai Fu
Peking University
Ning Chen
Ning Chen
Peking University
satellite image understandingdeep learningrecommendation systemlarge-scale sparse learning
J
Junkai Zhao
Beijing Academy of Artificial Intelligence
Pengwei Wang
Pengwei Wang
University of Calgary
Computer Science Security
G
Guocai Yao
Beijing Academy of Artificial Intelligence
Shanghang Zhang
Shanghang Zhang
Peking University
Embodied AIFoundation Models
Hao Chen
Hao Chen
Zhejiang University
Computer Science
Z
Zhe Li
Ant Digital Technologies, Ant Group
K
Kai Zhu
Ant Digital Technologies, Ant Group