🤖 AI Summary
Real-world robotic perception typically relies on large-scale manually annotated datasets, which are costly and time-consuming to produce. This work proposes FalconApp—an end-to-end iPhone application that enables users to capture handheld videos of objects and automatically generates editable assets using Gaussian Splatting (GSplat) for 3D representation. The system synthesizes photorealistic images with natural backgrounds while simultaneously producing automatic annotations for object masks and 6-degree-of-freedom poses. FalconApp streamlines the entire pipeline—from video capture and realistic data synthesis to automatic labeling and on-device deployment of perception models. On five categories of rigid objects, it trains usable models in an average of 20 minutes, achieving end-to-end inference latency of approximately 30 ms on iPhone, with pose estimation accuracy surpassing the PnP baseline for four out of five object categories.
📝 Abstract
Reliable perception for robotics depends on large-scale labeled data, yet real-world datasets rely on heavy manual annotation and are time-consuming to produce. We present FalconApp, an iPhone app with an end-to-end frontend-backend pipeline that turns a short handheld capture of a rigid object into a perception module for mask detection and 6-DoF pose estimation. Our core contribution is a rapid mobile deployment pipeline paired with a photorealistic auto-labeling workflow: from a user-captured video of an object, FalconApp reconstructs an editable GSplat asset, composites it with diverse photorealistic backgrounds, renders synthetic images with ground-truth masks and poses, trains the perception module, and deploys it back to the iPhone frontend. Experiments across five rigid objects with diverse geometry and appearance show that FalconApp produces usable perception models with about 20 minutes of synthetic-data generation and training per object on average, around 30 ms end-to-end on-device latency on iPhone, and better overall pose accuracy than a PnP baseline on 4 / 5 objects in both simulation and real-world evaluation.