MobileEgo Anywhere: Open Infrastructure for long horizon egocentric data on commodity hardware

πŸ“… 2026-05-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

263K/year
πŸ€– AI Summary
Existing egocentric datasets are limited in duration, making it difficult to model the long-horizon temporal dependencies required for complex robotic tasks. This work proposes the first large-scale, long-duration egocentric data collection framework based on smartphones, leveraging multimodal sensors and high-precision camera pose tracking from consumer-grade devices to enable continuous trajectory capture over hour-long sessions. The project releases an open-source mobile application and an end-to-end standardized processing pipeline, along with a new dataset comprising 200 hours of diverse real-world scenarios. By significantly lowering the barrier to data acquisition, this effort advances research in vision-language-action models and embodied foundation models while promoting data democratization in the field.
πŸ“ Abstract
The recent advancement of Vision Language Action (VLA) models has driven a critical demand for large scale egocentric datasets. However, existing datasets are often limited by short episode durations, typically spanning only a few minutes, which fails to capture the long horizon temporal dependencies necessary for complex robotic task execution. To bridge this gap, we present MobileEgo Anywhere, a framework designed to facilitate the collection of robust, hour plus egocentric trajectories using commodity mobile hardware. We leverage the ubiquitous sensor suites of modern smartphones to provide high fidelity, long term camera pose tracking, effectively removing the high hardware barriers associated with traditional robotics data collection. Our contributions are three fold: (1) we release a novel dataset comprising 200 hours of diverse, long form egocentric data with persistent state tracking; (2) we open source a mobile application that enables any user to record egocentric data, and (3) we provide a comprehensive processing pipeline to convert raw mobile captures into standardized, training ready formats for Vision Language Action model and foundation model research. By democratizing the data collection process, this work enables the massive scale acquisition of long horizon data across varied global environments, accelerating the development of generalizable robotic policies.
Problem

Research questions and friction points this paper is trying to address.

egocentric data
long horizon
Vision Language Action models
temporal dependencies
robotic task execution
Innovation

Methods, ideas, or system contributions that make the work stand out.

egocentric data
long-horizon learning
commodity hardware
Vision Language Action (VLA)
mobile sensing
πŸ”Ž Similar Papers
No similar papers found.
S
Senthil Palanisamy
FPV Labs, Bangalore, India
Abhishek Anand
Abhishek Anand
Unknown affiliation
computer-aided reasoninginteractive proof assistantsrobotics
S
Satpal Singh Rathor
FPV Labs, Bangalore, India
P
Pratyush Patnaik
FPV Labs, Bangalore, India
S
Shubhanshu Khatana
FPV Labs, Bangalore, India