Habilis-$β$: A Fast-Motion and Long-Lasting On-Device Vision-Language-Action Model

📅 2026-02-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing vision-language-action models, which are typically evaluated solely on single-task success rates and thus fail to capture the throughput and long-term reliability required for real-world deployment. To bridge this gap, the authors propose a vision-language-action model tailored for edge-based real-world scenarios, introducing the Productivity-Reliability Plane (PRP) evaluation framework grounded in continuous-operation protocols. Key innovations include language-agnostic pretraining on large-scale play data, cyclic task fine-tuning, phase-adaptive motion planning (ESPADA), rectified flow distillation, and classifier-free guidance. The model achieves 572.6 tasks per hour (TPH) with a mean time between interventions (MTBI) of 39.2 seconds in simulation, and 124 TPH with 137.4 seconds MTBI on real-world logistics tasks—significantly outperforming baselines and establishing state-of-the-art performance on the RoboTwin 2.0 benchmark.

Technology Category

Application Category

📝 Abstract
We introduce Habilis-$β$, a fast-motion and long-lasting on-device vision-language-action (VLA) model designed for real-world deployment. Current VLA evaluation remains largely confined to single-trial success rates under curated resets, which fails to capture the fast-motion and long-lasting capabilities essential for practical operation. To address this, we introduce the Productivity-Reliability Plane (PRP), which evaluates performance through Tasks per Hour (TPH) and Mean Time Between Intervention (MTBI) under a continuous-run protocol that demands both high-speed execution and sustained robustness. Habilis-$β$ achieves high performance by integrating language-free pre-training on large-scale play data for robust interaction priors with post-training on cyclic task demonstrations that capture state drift across consecutive task iterations. The system further employs ESPADA for phase-adaptive motion shaping to accelerate free-space transit, utilizes rectified-flow distillation to enable high-frequency control on edge devices, and incorporates classifier-free guidance (CFG) as a deployment-time knob to dynamically balance instruction adherence and learned interaction priors. In 1-hour continuous-run evaluations, Habilis-$β$ achieves strong performance under the PRP metrics, compared to $π_{0.5}$ in both simulation and real-world environments. In simulation, Habilis-$β$ achieves 572.6 TPH and 39.2 s MTBI (vs. 120.5 TPH and 30.5 s for $π_{0.5}$), while in a real-world humanoid logistics workflow it achieves 124 TPH and 137.4 s MTBI (vs. 19 TPH and 46.1 s for $π_{0.5}$). Finally, Habilis-$β$ achieves the highest reported performance on the standard RoboTwin 2.0 leaderboard across representative tasks, validating its effectiveness in complex manipulation scenarios.
Problem

Research questions and friction points this paper is trying to address.

vision-language-action
fast-motion
long-lasting
on-device
continuous-run evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-language-action
on-device inference
continuous-run evaluation
rectified-flow distillation
classifier-free guidance
🔎 Similar Papers
No similar papers found.
J
Jesoon Kang
Tommoro Robotics
T
Taegeon Park
Tommoro Robotics
J
Jisu An
Tommoro Robotics
S
Soo Min Kimm
Tommoro Robotics
J
Jaejoon Kim
Tommoro Robotics
J
Jinu Pahk
Tommoro Robotics
Byungju Kim
Byungju Kim
Amazon
machine learningdeep learning
J
Junseok Lee
Tommoro Robotics
N
Namheon Baek
Tommoro Robotics
S
Sungwan Ha
Tommoro Robotics
H
Hojun Baek
Tommoro Robotics
E
Eduardo Ayerve Cruz
Tommoro Robotics
W
Wontae Kim
Tommoro Robotics
J
Junghyeon Choi
Tommoro Robotics
Y
Yousuk Lee
Tommoro Robotics
J
Joonmo Han
Tommoro Robotics
Sunghyun Cho
Sunghyun Cho
POSTECH
Computer GraphicsComputer VisionImage ProcessingComputational Photography
S
Sunghyun Kwon
Tommoro Robotics
S
Soyoung Lee
Tommoro Robotics
Jun Ki Lee
Jun Ki Lee
Associate Research Professor, Seoul National University AI Institute
Artificial IntelligenceRoboticsTask and Motion PlanningTeleoperationSocial Robots
Seung-Joon Yi
Seung-Joon Yi
Pusan National University
Intelligent RoboticsMachine Learning
Byoung-Tak Zhang
Byoung-Tak Zhang
Professor of Computer Science, Cognitive Science, and Brain Science, Seoul National University
Machine LearningArtificial IntelligenceCognitive Science
T
Theo Taeyeong Kim
Tommoro Robotics